Recursion ceases for 5-10 minutes at random intervals throughout the day

JINMEI Tatuya / 神明達哉 Jinmei_Tatuya at isc.org
Fri Feb 15 04:00:39 UTC 2008


At Wed, 13 Feb 2008 17:32:41 -0500,
Bill Springall <springall at fuse.net> wrote:

>      Each server handles anywhere between 500-1500 qps throughout the
> day, under normal load.  Problem occurs at all loads.
>      I've tried port, "monitoring", tcpdumping the traffic, and sifting 
> through the requests and nothing seems out of the ordinary.   Numerous 
> tweaks of the OS have not helped (state table within limits and then 
> disabled, firewall deactivated/activated, eth stats good).  When the 
> problems happens I can get onto the machine and it is ok (network 
> upstream good, routing table hasn't inherited anything new, server calm) 
>   When I turn logging up to a level that can help, named can't keep up.
>      We are now have a troubleshooting process in the works that 
> involves different hardware and 9.4.2, environment re-architecture,  as 
> well as, <shiver>, other caching dns software.
>      Is there a known problem, that I haven't been able to find, that 
> could be causing this?   As I understand the, "Server Failure", message 
> is a general message, could someone help to point me to the next thing 
> to try?   Any help would be appreciated!

I cannot think of a reason, but please let me ask something first.

- according to your description, the queries were not dropped, but
  were simply responded with server failure, right?
- how much of memory does named use when this occurs?
- how busy (in terms of CPU utilization) is named when this occurs?
- does this change if you disable threads?

Thanks,

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.



More information about the bind-users mailing list