Named dying

Adrian Daminato adrian at tucows.com
Fri Jun 18 14:46:09 UTC 1999


I apologize now for the lengthy email :)

I am maintaining a name server which is hosting almost 12000 domains.  Just
recently, named started exhibiting curious behaviour.  Randomly, for no reason,
named will completely stop responding.  If I have a trace on, nothing is
logged.  As well, no errors or messages are reported to syslog.  The daemon
stays in this manner until restarted, or reloaded (which takes close to 30
minutes under our current architecture).  I have found that sending a USR1
signal, followed by a USR2 signal (ndc trace, ndc notrace), the named starts
responding again. 

I have tried having the trace at various levels to see if anything out of the
ordinary shows up in the logs when the daemon stops responding, but I haven't
been able to find any pattern.  

Until yesterday I was running BIND 8.1.2 on Linux 2.2.4.  I upgraded yesterday
to BIND 8.2.1-T4B, to see if maybe there was a significant change made between
8.1 and 8.2.  The problem still occurs.  

Sometimes the daemon will run without problems for several hours, sometimes it
needs constant attention for several hours.  In the interim, I have written a
script that calls ndc trace, ndc notrace every few minutes, as to minimize
disruption.

So, two questions
1) Has anyone seen this before, and if so, how do I fix it?
2) does turning on and off the trace frequently cause any problems?  I haven't
been able to see any caveats, but I don't want to do any harm in order to save
the server :) 

Any help would be more than appreciated.  Thanks

 --
Adrian Daminato 
Tucows International Corp.



More information about the bind-users mailing list