Increase in retry and timeout errors post 9.9.4 -> 9.11.4 upgrade

Gareth Parks gparks at
Mon May 4 04:14:24 UTC 2020

I set send-cookie no; globally to test this theory out but the pattern of retries and timeout continued. Despite this I was able to determine the retries/timeouts matches the same pattern as the resolver statistic for truncated responses received which suggests they are related.

When I look at the same graph on one of the other servers it doesn't have any truncated responses but instead has a lot of NXDOMAIN errors which the upgraded server does not.


From: Mark Andrews <marka at>
Sent: Monday, 4 May 2020 12:13 PM
To: Gareth Parks
Cc: bind-users at
Subject: Re: Increase in retry and timeout errors post 9.9.4 -> 9.11.4 upgrade

Message from External Sender

Well BIND 9.11+ supports DNS COOKIE by default and there are some servers that mishandle EDNS requests with a DNS COOKIE option present.  Unknown EDNS options are supposed to be ignored, but there are servers/firewalls that just drop such queries.  Others return FORMERR, others return NXDOMAIN when there is a answer w/o the option being present, others echo unknown options, and others still send back a DNS COOKIE response but fail to correctly copy the client cookie part to the response.  show how servers for .GOV zone behave when presented with a unknown EDNS option.  Other datasets are similar.

You can use "server <prefix> { send-cookie no; };” to work around known broken servers.


> On 4 May 2020, at 11:21, Gareth Parks <gparks at> wrote:
> Hi,
> I have three centos 7 servers running bind acting as internal resolvers. There was an update released that upgrades them from 0:9.9.4-74.el7_6.2 to 32:9.11.4-16.P2.el7_8.2. On performing this upgrade to one of the servers there has been a notable increase in retry and timeout errors as measured by data collected from the statistics channel. Where previously the number of errors for retry and timeouts was < 10/2 minutes I now regularly see spikes > 50/2 minutes and the error levels have remained consistent on the other two servers. When I downgrade the server back to 9.9.4 the error rate drops as well.
> I increased the log level for the query-errors log and observed the number of entries between the upgraded and non-upgraded servers were about the same so there doesn't appear to be an increase in errors.
> I'm not sure whether the issue is that I'm not looking in the correct place to identify the source of retries/timeouts or the other possibility that occurred to me is that there might have been a change between the two versions for what data is represented by those retry/timeout counters and the increased rate is not a problem but just representing different information.
> Gareth
> <OutlookEmoji-signature_2340144644a600368-9f8b-4dd9-9094-d4611542cbcc.png>_______________________________________________
> Please visit  to unsubscribe from this list
> bind-users mailing list
> bind-users at

Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742              INTERNET: marka at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the bind-users mailing list