Bind 9.1.3 stop resolving but is still running.

Nate Campi nate at wired.com
Wed Sep 5 18:24:41 UTC 2001


On Wed, Sep 05, 2001 at 10:22:39AM -0700, Cade Robinson wrote:
> 
> I am running BIND 9.1.3 (saw this in 9.1.2 as well) on Solaris 8.
> I can start and run BIND for a random amount of time from what I can
> see before this happens.
> What I am seeing is that the server will resolv everything (internal
> and external) just fine, but at some point it stops resolving
> anything.
> The named process is still running but I can't HUP or kill named.
> I have to kill -9 the process.
> 
> Has anyone had this issue?
> Should I go back to BIND 8?
> Is there anything I can do to fix it?

I have a similar issue on Solaris 8 with BIND 9.1.3 - both on Sparc and
x86. I compiled with and without thread support on both architectures,
with the same results.

What happens to me is that I'm running an app that does resolution on
IPs for our reporting team, as fast as BIND can handle it. BIND 8
handles up to around 800 queries per second, but BIND 9 handles at most
half of that, and stops resolving for 45 seconds to a minute at a time.
This happens every two or three minutes. Running BIND 9 at debug level 3
shows it answer a lot of queries, then stop answering the client
resolver while doing a bunch of "createfetch" operations. BIND 9 never
stops responding to kill or HUP signals, but pauses on answering new
queries.

It is as if BIND 9 is having trouble with unresponsive name servers, and
not answering additional queries while it tries to resolve the current
IPs. This is of course a wild guess, I need to collect more debugging
output, and analyze it thoroughly.

I also wonder if part of the poor performance in BIND 9 (when it is
actually answering queries) is due to the three tries against remote
nameservers to see if they handle EDNS0. Can this be turned off?

Is BIND 9 even up the the task of replacing BIND 8 on heavily loaded
boxes? From my tests so far, it cannot replace our current public DNS
servers, or even caching servers for internal use - due to poor
performance.
-- 
Nate Campi, UNIX Ops WiReD SF, Terra Lycos DNS, (415) 276-8678  


More information about the bind-users mailing list