Clients get DNS timeouts because ipv6 means more queries for each lookup
jik at kamens.us
Wed Jul 13 04:45:18 UTC 2011
Well, all the prodding from people here prompted me to investigate
further exactly what's going on. The problem isn't what I thought it
was. It appears to be a bug in glibc, and I've filed a bug report and
found a workaround.
In a nutshell, the getaddrinfo function in glibc sends both A and AAAA
queries to the DNS server at the same time and then deals with the
responses as they come in. Unfortunately, if the responses to the two
queries come back in reverse order, /and/ the first one to come back is
a server failure, both of which are the case when you try to resolve
en.wikipedia.org immediately after restarting your DNS server so nothing
is cached, the glibc code screws up and decides it didn't get back a
successful response even though it did.
If you do the same lookup again, it works, because the CNAME that was
sent in response to the A query is cached, so both the A and AAAA
queries get back valid responses from the DNS server. And even if that
weren't the case, since the CNAME is cached it gets returned first,
since the server doesn't need to do a query to get it, whereas it does
need to do another query to get the AAAA record (which recall isn't
being cached because of the previously discussed FORMERR problem). It'll
keep working until the cached records time out, at which point it'll
happen again, and then be OK again until the records time out, etc.
The workaround is to put "options single-request" in /etc/resolv.conf to
prevent the glibc innards from sending out both the A and AAAA queries
at the same time.
FYI, here's the glibc bug I filed about this:
Thank you for telling me I was full of it and making me dig deeper into
this until I located the actual cause of the issue. :-)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the bind-users