Resolver timeouts, EDNS and networking

Christian Robottom Reis kiko at
Fri Sep 28 13:41:46 UTC 2007

On Thu, Sep 27, 2007 at 07:27:10PM -0400, Kevin Darcy wrote:
> > Has anyone seen this before? Is the EDNS0 issue a red herring, or is
> > what I'm seeing indicative of EDNS being broken at a few sites,
> > including my forwarders? I can issue manual EDNS queries (using dig
> > +bufsize=500) just fine, so I would think not.
> >   
> Hmmm... bufsize of 500 is rather silly, since that's _below_ the default 
> buffer size (512). I'd set it to something higher. In fact, I'd probably 
> do a packet trace of the forwarded queries and then try to replicate 
> them *exactly* with "dig", including EDNS0 buffer size, source address, 
> even source port. In the unlikely event that you're TSIG-signing your 
> queries, I'd mimic that behavior as well. Assuming that you're still 
> getting timeouts on precisely-mimic'ed queries, then I'd start changing 
> things to see what makes it work better. A DNS query packet has only a 
> finite number of attributes -- it should be possible to home in on the 
> attribute or combination of attributes that is giving rise to the problem.

So after a day and night of this, ISTM that the resolvers appear to be
red herrings. I disabled the resolvers last night but given it was off
office peak hours I saw the timeouts lessened, and today, as soon as the
office is in buzz I am seeing timeouts peak to 87 in a single minute
(just counting the "too many timeouts" string in the debug log).

A few are for servers I would not expect to time out:

      6 0x8288158('):
      4 0xb489e018('):
      3 0xb452f9d0('):
      3 0x8296f68('):
      2 0xb489b080('):
      2 0xb455c250('):

Am I right in assuming that when the server logs a "too many timeouts"
it's likely that the client resolver library will have given up and
reported an error upstream?

The fact that the problems are really intermittent and that I am unable
to reproduce any EDNS-related failures (just following the hint I picked
up at
suggests to me that either the network latency rises too high (it's
around 40ms to my upstream hop, and I can see some packet loss, though
not more than 5%) or the server is overloaded doing reverse-DNS
queries for apache and DNSBL-related queries for sendmail.

> Note that the 10-minute TTL on is going to incur a 
> fairly high fetch rate, and if there is some sort of connectivity 
> problem between your ISP's nameservers and the 
> nameservers, you could very well get timeouts. Is it possible that all 

Yeah, I raised that with the sysadmins and have requested they increase
that, but I am still left with my general problem.

> Another, somewhat non-scalable, high-maintenance "middle ground" option 
> would be to keep your forwarding configuration, but define 
> "" as a "type stub" zone. The high-maintenance part comes 

The problem is that I'm not really restricted to -- we get
timeouts for assorted queries. I just picked because I
care about it, and because it was easy to find in the logs!
Christian Robottom Reis | | [+55 16] 3376 0125

More information about the bind-users mailing list