recursive queries fail with high load?
Chris Michels
Chris.Michels at NAU.EDU
Mon Feb 26 15:23:26 UTC 2007
Sotiris Tsimbonis wrote:
> Unfortunately, we seem to face the same problem with bind 9.3.3. After
> 2-3 days of uptime, for no apparent reason, all answers take too long
> and usually timeout.
>
You say all answers take too long and usually timeout. Does that
include queries for which your are authoritative? For us only recursive
queries are taking a long time.
> When this happens, we notice a drop in successful queries in
> named.stats, machine load jumps to >1 (normally around 0.50), named
> process starts consuming 100% of cpu (normally it's under 30%) and
> memory usage stays the same.
>
I am not seeing high CPU usage which makes me think we are not seeing
the same problem.
> ...
>
> # rndc status
> number of zones: 20
> debug level: 99
> xfers running: 0
> xfers deferred: 0
> soa queries in progress: 0
> query logging is ON
> recursive clients: 25/10000
> tcp clients: 0/100
> server is up and running
>
Our recursive clients is much higher which also makes me think this may
not be the same problem.
> ...
>
> The only solution so far is to restart bind..
> Thoughts/suggestions of how to debug this further are more than welcome.
>
Restarting bind solves the problem only temporarily for us. It can
reoccur in as little as a couple of minutes if the load it high.
> Sotiris.
>
Note that when it is really bad increasing the timeout of the query
doesn't help and dig gets a SERVFAIL response. Sometimes we get
SERVFAIL even with a short timeout like this:
[root at ruby ~]# dig www.websudoku.com @ns2.nau.edu
;; Warning: ID mismatch: expected ID 58978, got 17738
;; Warning: ID mismatch: expected ID 58978, got 17738
; <<>> DiG 9.2.4 <<>> www.websudoku.com @ns2.nau.edu
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 58978
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;www.websudoku.com. IN A
;; Query time: 2840 msec
;; SERVER: 134.114.138.3#53(134.114.138.3)
;; WHEN: Mon Feb 26 08:16:08 2007
;; MSG SIZE rcvd: 35
--
Chris Michels -- Systems Programmer/Team Lead -- +1 928 523-6495
Northern Arizona University -- Flagstaff, AZ
PGP key: http://jan.ucc.nau.edu/~cvm <http://jan.ucc.nau.edu/%7Ecvm>
Team Info: http://www4.nau.edu/its/sia
"The first chore in managing change is the toughest: Self-management. Handle
that right and you're halfway home." -- Price Pritchett
More information about the bind-users
mailing list