Clients get DNS timeouts because ipv6 means more queries for each lookup

Kevin Darcy kcd at chrysler.com
Wed Jul 13 17:31:51 UTC 2011


On 7/13/2011 1:06 PM, Kevin Darcy wrote:
> On 7/13/2011 2:35 AM, Jonathan Kamens wrote:
>> On 07/13/2011 02:13 AM, Mark Andrews wrote:
>>>> Well, all the prodding from people here prompted me to investigate
>>>> further exactly what's going on. The problem isn't what I thought it
>>>> was. It appears to be a bug in glibc, and I've filed a bug report and
>>>> found a workaround.
>>> There is no bug in glibc.
>> To be blunt, that's bullshit.
>>
>> If glibc makes an A query and an AAAA query, and it gets back a valid 
>> response to the A query and an invalid response to the AAAA query, 
>> then it should ignore the invalid response to the AAAA query and 
>> return the valid A response to the user as the IP address for the host.
>>
>> Please note, furthermore, that as I explained in detail in my bug 
>> report and in my last message, glibc behaves differently based on the 
>> /order/ in which the two responses are returned by the DNS server. 
>> Since there's nothing that says a DNS server has to respond to two 
>> queries in the order in which they were received, and that would be 
>> an impossible requirement to impose in any case, since the queries 
>> and responses are sent via UDP which doesn' guarantee order, it's 
>> perfectly clear that glibc needs to be prepared to function the same 
>> regardless of the order in which it receives the responses.
> I agree that the order of the A/AAAA responses shouldn't matter to the 
> result. The whole getaddrinfo() call should fail regardless of whether 
> the failure is seen first or the valid response is seen first. Why? 
> Because getaddrinfo() should, if it isn't already, be using the RFC 
> 3484 algorithm (and/or whatever the successor to RFC 3484 ends up 
> being) to sort the addresses, and for that algorithm to work, one 
> needs *both* the IPv4 address(es) *and* the IPv6 address(es) 
> available, in order to compare their scopes, prefixes, etc.. If one of 
> the lookups "fails", and this failure is presented to the RFC 3484 
> algorithm as NODATA for a particular address family, then the 
> algorithm could make a bad selection of the destination address, and 
> this can lead to other sorts of breakage, e.g. trying to use a 
> tunneled connection where no tunnel exists.  The *safe* thing for 
> glibc to do is to promote the failure of either the A lookup or the 
> AAAA lookup to a general lookup failure, which prompts the 
> user/administrator to find the source of the problem and fix it.
>
> It's rarely a good idea to mask undeniable errors as if there were no 
> error at all. It leads to unpredictable behavior and really tough 
> troubleshooting challenges. I think glibc is erring on the side of 
> openness and transparency here, rather than trying to cover up the 
> fact that something is horribly wrong.
>
>>
>>> Note your "fix" won't help clients that only ask for AAAA records
>>> because it is the authoritative servers that are broken, not the
>>> resolver library or the recursive server.
>> I am aware of that. It is irrelevant, because it is not the problem I 
>> am trying to solve. I, and 99.999999% of the users in the world, are 
>> /not/ "only ask[ing] for AAAA records." Nobody actually trying to use 
>> the internet for day-to-day work is doing that right now, because to 
>> say that IPv6 support is not yet ubiquitous would be a laughably 
>> momentous understatement.
> What about clients in a NAT64/DNS64 environment? They could be 
> configured as IPv6-only but normally able to access the IPv4 Internet 
> just fine. Even with your glibc "fix" in place, though, they'll 
> presumably break if the authoritative nameservers are giving garbage 
> responses to AAAA queries (could someone with practical experience in 
> DNS64 please confirm this?).
>
> Another possibility you're not considering is that the invoking 
> application itself may make independent IPv4-specific and 
> IPv6-specific getaddrinfo() lookups. Why would it do this? Why not? 
> Maybe IPv6 capability is something the user has to buy a separate 
> license for, so the IPv6 part is a slightly separate codepath, added 
> in a later version, than the base product, which is IPv4-only. When 
> one of the getaddrinfo() calls returns address records and the other 
> returns garbage, your "fix" doesn't prevent such an application from 
> doing something unpredictable, possibly catastrophic. So it's really 
> not a general solution to the problem.
Oh, I should also point out that this brokenness by the 
wikipedia/wikimedia nameservers *isn't* just specific to AAAA queries, 
and therefore *isn't* "fixable" with getaddrinfo() alone. Try doing an 
MX query of en.wikipedia.org. Or a PTR query. Or any of the other "old" 
(yet non-deprecated) query types (e.g. NS, TXT, HINFO). The only QTYPEs 
that are answered correctly are A, CNAME and (oddly enough) SOA. So they 
don't even have the excuse of "well, AAAA queries are kinda new, we 
haven't got around to handling them properly yet". This behavior has 
failed to conform to the standard, for as long as the standard has 
existed; it's not recent, IPv6-specific breakage.

                                                                         
                                                                         
                                                         - Kevin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20110713/d3a6fab8/attachment.html>


More information about the bind-users mailing list