Two separate replies for one query to some domains
John Z. Bohach
jzb2 at aexorsyst.com
Mon May 3 12:03:49 UTC 2010
I'm trying to run a local caching-only nameserver (bind-9.3.3) on Linux
in order to bypass my ISP's name-servers, and most things work fine,
except some domains behave strangely.
For example, forecast.weather.gov has a TTL of 5 seconds.
My initial look-up works correctly, and the response is in fact cached
for 5 seconds. After 5 seconds, another look-up sequence is initiated,
but this time, the look-up fails.
I ran ethereal to get a packet trace, and what I find is that the first
query's response is fine, as expected. Further lookups within the 5
second TTL don't generate any external traffic, as expected.
However, after 5 seconds, another look-up sequence is initiated, but
this time the response is as follows:
First response is an OPT RR, with some fairly useless information, as
far as I can tell. Socket is then closed, likely from my end.
Second response (without any further queries from my end), about 300 -
500 milliseconds after the first response, contains the correct
response sequence to get further towards the resolution (next resolver,
etc.). Problem is, by this point, the socket was closed, and
bind-9.3.3 went away with the OPT RR which is not really an answer, and
thus ignores the second response, which would get to the correct
answer, had it been received, instead of hitting a closed socket. So I
get a failed look-up error. Trace shows second response was dropped.
To overcome this, I've forced all outgoing queries to originate from
port 53 as well, but even though that seems to receive the second
response (server listens on 53), it seems to get discarded as only the
first response (OPT RR) seems to get processed, which still results in
a failed look-up.
And now, the icing on all of this is that after 100 seconds of failure,
the forecast.weather.gov site responds correctly, without any
intermeditate OPT RR, and resolution succeeds. This caches the
response for the TTL of 5 seconds, and this whole thing starts over,
with 100 seconds of OPT RR + correct (but ignored by bind) RR, then
correct RR only (and success), and repeats.
Is this something in my configuration? Is this a *.weather.gov problem?
Or is it just that my ISP is messing with my packets? Or is this just
my tax-dollars at work? A few other sites have similar problems, all
with extremely short TTL's (but the vast majority of domains work
correctly), was just wondering how I could get around this. If I set
my resolver to my ISP's, the *.weather.gov sites initially respond with
a TTL of 5 seconds, but afterwards, all have TTL's in 90's for
Could someone else comment on any experience with the *.weather.gov
John Z. Bohach
More information about the bind-users