bind-8.2.2-P5 hangs on defunct named-xfer
Jan-Erik Eriksson
jee at alcom.aland.fi
Fri Dec 3 12:54:28 UTC 1999
On Fri, 3 Dec 1999, Jan-Erik Eriksson wrote:
>On Thu, 2 Dec 1999, Jan-Erik Eriksson wrote:
>
>>On Thu, 2 Dec 1999 Mark_Andrews at iengines.com wrote:
>>
>>>> We are running bind-8.2.2-P5 with the nc_ctl.c patch on a redhat 6.1 i386
>>>> box. About once a day I get a zombie named-xfer, marked as <defunct> in
>>>> the ps listing.
>>>>
>>>> When this happens named stops answering requests. It remains bound to it's
>>>> listen port, so that clients believe it is working ok. This means that the
>>>> fallback to the next nameserver, stated in resolv.conf, never happens.
>>>
>>> We have had no reports of named not reaping its children with
>>> the current release. Failing to reap a child should not cause
>>> the problems you are describing. Also named not answering will
>>> not prevent the resolver from falling over to the next nameserver.
>>>
>>> Firstly please confirm that you are running BIND 8.2.2-P5, use
>>> ndc status.
>>
>>named 8.2.2-P5 tis nov 30 14:02:00 EET 1999
>>
>>> Second can you please get a system call trace when named gets into
>>> this state.
>>
>>Well, I am not sure exactly under what conditions this happens. Which
>>means that is a bit hard to reproduce. I'll see what I can do.
>
>I attached an strace when I noticed the "hang". Then I sent a SIGUSR1 to
>named:
>
>13:53:10.510316 send(3, "<30>Dec 3 13:52:03 named[23980]"..., 85, 0) = ?
>ERESTARTSYS (To be restarted)
>13:53:34.953512 --- SIGUSR1 (User defined signal 1) ---
>13:53:34.973276 sigreturn() = ? (mask now [])
>13:53:34.973501 close(3) = 0
>13:53:34.974046 open("/dev/console", O_WRONLY|O_NOCTTY) = 3
>
>[ lots of stuff, and then at the next connect to /dev/log: ]
>
>13:53:35.014566 socket(PF_UNIX, SOCK_DGRAM, 0) = 8
>13:53:35.014789 fcntl(8, F_SETFD, FD_CLOEXEC) = 0
>13:53:35.014998 connect(8, {sun_family=AF_UNIX, sun_path="/dev/log"}, 16) = 0
>13:53:35.015309 send(8, "<30>Dec 3 13:53:35 named[23980]"..., 84, 0) = -1
>ECONNRESET (Connection reset by peer)
>13:54:39.427052 close(8) = 0
>13:54:39.427381 open("/dev/console", O_WRONLY|O_NOCTTY) = 8
>
>It hangs on the send call to /dev/log. I don't know if this is a bind,
>kernel /UNIX domain socket) or syslog problem, but other logging to the
>syslogd seems to work fine.
>Restarting sysklogd won't help. Only SIGKILL and restart to named will fix
>the situation.
Correction! Killing the syslogd will make the "hang" release. This means
this is most probably not a bind problem. Thanks for your help.
>Any hints are appreciated. We are running standard RedHat 6.1 2.2.12-20
>SMP kernel with the sysklogd-1.3.31-14 package.
-- Janne
------------- ÅLCOM ------------- Network Operations Center ---------
Jan-Erik Eriksson mailto: jee at alcom.aland.fi
ÅLCOM phone: +358 18 23500
PB 233, Torggatan 10 fax: +358 18 14643
FIN-22100 Mariehamn URL: http://www.alcom.aland.fi
More information about the bind-users
mailing list