recursive queries fail with high load?

Chris Michels Chris.Michels at NAU.EDU
Mon Feb 26 15:23:26 UTC 2007


Sotiris Tsimbonis wrote:
> Unfortunately, we seem to face the same problem with bind 9.3.3. After 
> 2-3 days of uptime, for no apparent reason, all answers take too long 
> and usually timeout.
>   
You say all answers take too long and usually timeout.  Does that
include queries for which your are authoritative?  For us only recursive
queries are taking a long time.
> When this happens, we notice a drop in successful queries in 
> named.stats, machine load jumps to >1 (normally around 0.50), named 
> process starts consuming 100% of cpu (normally it's under 30%) and 
> memory usage stays the same.
>   
I am not seeing high CPU usage which makes me think we are not seeing
the same problem.
> ...
>
> # rndc status
> number of zones: 20
> debug level: 99
> xfers running: 0
> xfers deferred: 0
> soa queries in progress: 0
> query logging is ON
> recursive clients: 25/10000
> tcp clients: 0/100
> server is up and running
>   
Our recursive clients is much higher which also makes me think this may
not be the same problem.
> ...
>
> The only solution so far is to restart bind..
> Thoughts/suggestions of how to debug this further are more than welcome.
>   
Restarting bind solves the problem only temporarily for us.  It can
reoccur in as little as a couple of minutes if the load it high.
> Sotiris.
>   
Note that when it is really bad increasing the timeout of the query
doesn't help and dig gets a SERVFAIL response.  Sometimes we get
SERVFAIL even with a short timeout like this:

[root at ruby ~]# dig www.websudoku.com @ns2.nau.edu
;; Warning: ID mismatch: expected ID 58978, got 17738
;; Warning: ID mismatch: expected ID 58978, got 17738

; <<>> DiG 9.2.4 <<>> www.websudoku.com @ns2.nau.edu
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 58978
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.websudoku.com.             IN      A

;; Query time: 2840 msec
;; SERVER: 134.114.138.3#53(134.114.138.3)
;; WHEN: Mon Feb 26 08:16:08 2007
;; MSG SIZE  rcvd: 35


-- 
Chris Michels -- Systems Programmer/Team Lead -- +1 928 523-6495
Northern Arizona University -- Flagstaff, AZ
PGP key: http://jan.ucc.nau.edu/~cvm <http://jan.ucc.nau.edu/%7Ecvm>
Team Info: http://www4.nau.edu/its/sia

"The first chore in managing change is the toughest: Self-management. Handle
that right and you're halfway home." -- Price Pritchett



More information about the bind-users mailing list