named-xfer causes 100% CPU Utilization
mayer at gis.net
Tue Oct 2 01:12:03 UTC 2001
At 05:39 PM 10/1/01, Kevin Vaughn wrote:
>I found another symptom of my problem and I was wondering if you could point
>me in the right direction. I turned debugging on for my slave server and I
>noticed something peculiar in the named.run log. Below is some of the
>01-Oct-2001 12:40:15.000 default: warning: zone transfer timeout for
>"pccatest.com"; pid 132 kill failed Errcode: 10035: Errcode: 10035: Errco
>01-Oct-2001 12:40:45.000 default: warning: zone transfer timeout for
>"pccatest.com"; second kill pid 132 - forgetting, processes may accumulate
>01-Oct-2001 14:05:03.000 default: notice: named-xfer for "pccatest.com"
>This is why the processes are piling up and not dying on their own; BIND
>can't kill them. Why are they failing (timing out) in the first place? The
>initial zone transfer works when the service is initially started. I have
>turned off NOTIFY to see if the problems go away. If I manually run
>/dns/bin/named-xfer -z pccatest.com -f /temp/testzone.db -d 3 -s 0
>pcnwpnstst, I still have a problem with named-xfer hanging and then
>multiplying itself over and over again. The last line above says it exited
>with 1. Doesn't a exit code of 1 mean that the transfer completed
>successfully? I have looked at the log for named-xfer, but all of the
>transfers complete with no errors. I have searched every log I can think of
>... I don't know what to do.
The key to the problems are on the master and not the slave. The above is just a
symptom of the problem. Upgrade to BIND 8.2.5-REL when it gets announced.
It won't totally solve the problem, but you won't have named-xfer processes chewing
up CPU. If you want to avoid the problem altogether, install BIND 9.2.0rc5 as
soon as it's announced.
>I have taken the advice of Danny Mayer and bumped my virtual memory on both
>servers to 500MB. I also replaced allow-update with allow-transfer, which
>was causing an error to be generated. While both of these suggestions
>helped, the original problem still persists.
The real problem is still on the master.
More information about the bind-users