bind-9.3.2 / CPU issue.

Pawel Rogocz pawel at rogocz.com
Sat Aug 19 23:48:18 UTC 2006


This issue has been troubling us for almost two years now, since we
deployed BIND9.

We have bunch of nameds running behind load balancer, getting on average
1k dns queries per second each.

We currently run with watchdogs which kill named if it starts using 100%
CPU.

Just recently I noticed that when named enters this state, it starts
replying with erroneous data.

For example, cached data never gets its TTL decreased,

www.sun.com has always TTL of 900. Also queries of type ANY
against authoritative data intermittently fail wirth SrvFail error. 

We also see increased number of Udp InErrors in /proc/net/snmp when
named enters this state.

We have run with all sorts of Linux 2.2/2.4 kernels and the problem was
always there. We curently run 9.3.2 with internal malloc enabled.


Pawel


On Tue, Aug 15, 2006 at 02:53:43PM -0700, Kelsey Cummings wrote:
> FWIW, I've seen similar behavior on some of our recusive servers in
> specific roles.  The only thing that might be unusual about our config is
> that a very high portion of the requests are going to forwarded zones.
> 
> It's be a consistent problem for us through all versions of bind 9 - we've
> had to us bind 8 to keep them stable.  We suspected it could be a problem
> with our compiler/libraries but the problem consistently occurs regardless
> of what distribution or version we try to run.  (All linux.)
> 
> It seems to be load related - only affects two of our internal recursors
> that do ~1k reqs/sec whereas our other more lightly loaded servers don't
> exhibit the same exact symptoms (although they also have been known to spin
> on the CPU.)
> 
> -- 
> Kelsey Cummings - kgc at corp.sonic.net      sonic.net, inc.
> System Architect                          2260 Apollo Way
> 707.522.1000                              Santa Rosa, CA 95407
> 


-- 



More information about the bind-users mailing list