Query sent, but no response

Dan Lowe dan at tangledhelix.com
Fri Sep 23 08:39:06 UTC 2005


I'm seeing an odd problem on my recursive resolvers that I can't  
quite figure out. Given queries in the sfchron.com zone (say,  
querying for mr1.sfchron.com. A) I am unable to get a response. If I  
log into the resolver host and use "dig" against the authoritative  
servers, I get a response, i.e.

dig @ns1.sfchron.com. mr1.sfchron.com. a

dig @ns2.sfchron.com. mr1.sfchron.com. a

However, if I dig @localhost (i.e. I am asking my resolver for the  
answer) then I get a varying response. On several of the resolvers, I  
get the same answer as the above queries. On others, I get a timeout.  
But on all resolvers, if I dig @nsN.sfchron.com it works; only on  
some does it timeout when I ask localhost.

On the hosts where I get timeouts, other queries against localhost  
work fine (for instance, yahoo.com. MX).

All resolvers are running BIND 9.3.1 on Sun Solaris 8 SPARC. I have  
reviewed the BIND configs to ensure the sfchron.com nameservers are  
not blackholed.

Results from snoop and tcpdump are consistent with the errors  
returned; i.e. when the client returns a timeout, I see a request go  
out, but no answer comes back.

Has anyone seen this before? I've seen similar behavior in the past,  
but it always turned out to be something simple (the remote end was  
filtering us, or we had the remote site's IP in our blackhole list...)

Lastly, the pattern appears to change somewhat over time. Of my 17  
resolvers, on one day a certain subset get timeouts. The next day it  
is a different set. The next a different set (though a small handful  
seem to constantly be present on the timeout list, I am so far  
considering that to be a coincidence).

Thanks for any help you can offer.

  -dan


-- 
logic (n.): the art of being wrong with confidence.





More information about the bind-users mailing list