Are failures cached?
jw354 at cornell.edu
Thu May 1 17:30:39 UTC 2008
Here I'm rehashing and expanding upon what has been said.
The top level domain server (for the .com domain, that delegates
water.com), answered queries for the water.com nameserver
with the wrong nameserver, saying furthermore, that that data
was valid for two days. This caused caching nameservers to
hold on to the wrong delegation data (NS record) for two days. (We
know it was the wrong nameserver because that's how you described
If that wrong nameserver didn't give answers for lookups within
water.com (e.g. no SOA, no authority flag (or no answer at all?)),
then caching nameservers running BIND9 would cache
the fact that that nameserver is not answering for 10 minutes, which
is a type of negative caching. This helps the caching nameservers'
efficiency by not recursing every single query it receives to
that nameserver. However, this negative caching saying the
nameserver is lame/missing doesn't affect that fact that this caching
has an NS record pointing at that wrong nameserver, i.e., it had been
that is where to go to resolve those names. While it will wait ten
minutes before sending
queries to that nameserver, after the ten minutes it goes and asks the
nameserver again. Hence problems remain for two days.
Since we know the specifics of your problem, we know that, in
this particular case, it would be helpful to delete the water.com NS
records in that
caching nameserver (even though the records still had time left on
and go back to the top level domain server and get the NS records again.
We know, from your description, that the top level domain nameservers'
corrected long before 2 days had passed. However, BIND9 does not assume
that the problem you describe is what triggered its negative caching.
In the general case, I expect tracing what records to throw out might
include some repeated backtracking, and even then, might
require human judgment. The nameserver software was
told by the authoritative server for ".com" that that water.com NS
good for 2 days, and keeps using it.
On another note, what you learned here about negative caching was the
behavior of BIND9 as a caching nameserver. Folks trying to reach you
be using lots of nameservers around the world, many of which would not
be BIND9. They'd likely follow the rules of caching NS records,
but we can't say what their negative caching behavior was.
Also, if this wrong nameserver for water.com answered with the authority
flag but with a wrong (positive) answer, than that record's TTL comes
play. Furthermore, if it said the name does not exist, then the SOA's
caching time comes into play. Neither of these cases overrides the TTL
on the NS
record that delegated water.com to that nameserver.
More information about the bind-users