BIND servfail from caching server

Fri Mar 4 00:06:54 UTC 2011

It's because the NS RRSet returned by the authoritative name servers lists servers that are not authoritative. Classic DNS mistake.

The com zone says that the authoritative servers for supernet.com are ns{2,3}.earthlink.net (delegation).

But supernet.com as hosted on ns{2,3}.earthlink.net says that dns{1,2}.earthlink.net are the authoritative servers. This latter set of servers is not actually authoritative for the zone.

For the first query, the resolver has not yet talked to the authoritative servers, so its only information is the delegation NS record set from com. The answer to that query, however, contains the authoritative NS record set, which is considered more credible and therefore replaces the delegation record set in the resolver's cache. Subsequent queries into the zone go to the bad servers, get lame responses, and fail.

Unless you own supernet.com, this problem is not your fault and not for you to fix. You can work around it with conditional forwarding, or a zone of type static-stub if you're using BIND 9.8 already, but that's strictly a workaround and subject to breakage if the zone is moved.

Chris Buxton
BlueCat Networks

On Mar 3, 2011, at 2:29 PM, Justin Krejci wrote:

> When doing a recursive query for MX supernet.com against a caching BIND
> server, the BIND server responds back with the answer.  The TTL is 300.
> 
> After the TTL expires the following recursive query for the same record
> returns a SERVFAIL from the caching server.
> 
> If I do a +trace on the same query to the same caching server for the
> same data it is able to respond with the answer yet a standard recursive
> query still gives a SERVFAIL.
> 
> Queries for other domains are working fine on this caching server. Other
> 3rd party DNS caching servers are responding fine for the same record
> above even after the TTL expires, tried @8.8.8.8 and @208.67.220.220
> 
> If if flush the cache on the caching server it successfully returns the
> answer to the query but only for the up the TTL's life then goes back to
> SERVFAIL again. (tried doing a full stop-and-start of named as well).
> 
> This particular server is running BIND 9.7.0-P2 but this exact same
> behavior is also happening on a server running 9.5.1-P2.1 as well.
> 
> So I noticed when doing a trace that the NS servers are different
> between the gtld and the actual authoritative servers.
> 
> <snip>
> com.                    172800  IN      NS      l.gtld-servers.net.
> com.                    172800  IN      NS      e.gtld-servers.net.
> ;; Received 502 bytes from 192.36.148.17#53(i.root-servers.net) in 2987
> ms
> 
> supernet.com.           172800  IN      NS      ns2.earthlink.net.
> supernet.com.           172800  IN      NS      ns3.earthlink.net.
> ;; Received 111 bytes from 192.54.112.30#53(h.gtld-servers.net) in 119
> ms
> 
> supernet.com.           300     IN      MX      5
> onemain-mx.earthlink.net.
> supernet.com.           3600    IN      NS      dns1.earthlink.net.
> supernet.com.           3600    IN      NS      dns2.earthlink.net.
> ;; Received 172 bytes from 207.217.120.43#53(ns3.earthlink.net) in 54 ms
> 
> 
> 
> Is this just a bug that upgrading BIND will fix or is there something
> else going on here?
> 
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users