slow query times after Bind upgrade

Sat Feb 11 23:00:32 UTC 2006

Last week I began the process of upgrading my two slave name servers from 
Bind 9.2.4 to 9.3.2. I completed one so far. Everything appeared to be 
fine but later I noticed that a few nslookups from a Windows workstation 
occasionally fail with a "DNS Server timeout. 2 seconds". I began testing 
5 external domain names against my 9.2.4 slave and 9.3.2 slave and find 
the original always succeeds, but the upgraded server times out 
occasionally. The trace shows that the query is always answered, but just 
not always within the two second timeout window. I would think a cached 
entry would fit within that 2 second window.

I've sniffed the wire at both the workstation and server end and can see 
the delayed response which appears to be at the server itself (ruling out 
the network and workstation). I'd like to see exactly what is happening on 
the server to the queries that are failing. What is the right debug level 
for bind to catch this information? Any other observations that might 
explain the different behaviour?

-Mike

P.S. The original slave (9.2.4) is an Sun Ultra10 running Solaris 8. The 
updated server (9.3.2) is a Sun V210 (dual processor) running Solaris 10. 
Bind compiled for multiprocessor support. One other thing I've noticed is 
that the default timeouts for the nslookup included with Solaris is much 
higher than two seconds. Hmmm.