Two separate replies for one query to some domains

Mon May 3 12:03:49 UTC 2010

Hello,

I'm trying to run a local caching-only nameserver (bind-9.3.3) on Linux 
in order to bypass my ISP's name-servers, and most things work fine, 
except some domains behave strangely.

For example, forecast.weather.gov has a TTL of 5 seconds.

My initial look-up works correctly, and the response is in fact cached 
for 5 seconds.  After 5 seconds, another look-up sequence is initiated, 
but this time, the look-up fails.

I ran ethereal to get a packet trace, and what I find is that the first 
query's response is fine, as expected.  Further lookups within the 5 
second TTL don't generate any external traffic, as expected.

However, after 5 seconds, another look-up sequence is initiated, but 
this time the response is as follows:

First response is an OPT RR, with some fairly useless information, as 
far as I can tell.  Socket is then closed, likely from my end.

Second response (without any further queries from my end), about 300 - 
500 milliseconds after the first response, contains the correct 
response sequence to get further towards the resolution (next resolver, 
etc.).  Problem is, by this point, the socket was closed, and 
bind-9.3.3 went away with the OPT RR which is not really an answer, and 
thus ignores the second response, which would get to the correct 
answer, had it been received, instead of hitting a closed socket.  So I 
get a failed look-up error.  Trace shows second response was dropped.

To overcome this, I've forced all outgoing queries to originate from 
port 53 as well, but even though that seems to receive the second 
response (server listens on 53), it seems to get discarded as only the 
first response (OPT RR) seems to get processed, which still results in 
a failed look-up.

And now, the icing on all of this is that after 100 seconds of failure, 
the forecast.weather.gov site responds correctly, without any 
intermeditate OPT RR, and resolution succeeds.  This caches the 
response for the TTL of 5 seconds, and this whole thing starts over, 
with 100 seconds of OPT RR + correct (but ignored by bind) RR, then 
correct RR only (and success), and repeats.

Is this something in my configuration?  Is this a *.weather.gov problem?  
Or is it just that my ISP is messing with my packets?  Or is this just 
my tax-dollars at work?  A few other sites have similar problems, all 
with extremely short TTL's (but the vast majority of domains work 
correctly), was just wondering how I could get around this.  If I set 
my resolver to my ISP's, the *.weather.gov sites initially respond with 
a TTL of 5 seconds, but afterwards, all have TTL's in 90's for 
subsequent look-ups.

Could someone else comment on any experience with the *.weather.gov 
domains?

Thank You,
John Z. Bohach