bind-8.2.2-P5 hangs on defunct named-xfer

Jan-Erik Eriksson jee at alcom.aland.fi
Fri Dec 3 12:54:28 UTC 1999


On Fri, 3 Dec 1999, Jan-Erik Eriksson wrote:

>On Thu, 2 Dec 1999, Jan-Erik Eriksson wrote:
>
>>On Thu, 2 Dec 1999 Mark_Andrews at iengines.com wrote:
>>
>>>> We are running bind-8.2.2-P5 with the nc_ctl.c patch on a redhat 6.1 i386
>>>> box. About once a day I get a zombie named-xfer, marked as <defunct> in
>>>> the ps listing. 
>>>> 
>>>> When this happens named stops answering requests. It remains bound to it's
>>>> listen port, so that clients believe it is working ok. This means that the
>>>> fallback to the next nameserver, stated in resolv.conf, never happens.
>>>
>>>	We have had no reports of named not reaping its children with
>>>	the current release.  Failing to reap a child should not cause
>>>	the problems you are describing.  Also named not answering will
>>>	not prevent the resolver from falling over to the next nameserver.
>>>
>>>	Firstly please confirm that you are running BIND 8.2.2-P5, use
>>>	ndc status.
>>
>>named 8.2.2-P5 tis nov 30 14:02:00 EET 1999
>>
>>>	Second can you please get a system call trace when named gets into
>>>	this state.
>>
>>Well, I am not sure exactly under what conditions this happens. Which
>>means that is a bit hard to reproduce. I'll see what I can do.
>
>I attached an strace when I noticed the "hang". Then I sent a SIGUSR1 to
>named:
>
>13:53:10.510316 send(3, "<30>Dec  3 13:52:03 named[23980]"..., 85, 0) = ?
>ERESTARTSYS (To be restarted)
>13:53:34.953512 --- SIGUSR1 (User defined signal 1) ---
>13:53:34.973276 sigreturn()             = ? (mask now [])
>13:53:34.973501 close(3)                = 0
>13:53:34.974046 open("/dev/console", O_WRONLY|O_NOCTTY) = 3
>
>[ lots of stuff, and then at the next connect to /dev/log: ]
>
>13:53:35.014566 socket(PF_UNIX, SOCK_DGRAM, 0) = 8
>13:53:35.014789 fcntl(8, F_SETFD, FD_CLOEXEC) = 0
>13:53:35.014998 connect(8, {sun_family=AF_UNIX, sun_path="/dev/log"}, 16) = 0
>13:53:35.015309 send(8, "<30>Dec  3 13:53:35 named[23980]"..., 84, 0) = -1
>ECONNRESET (Connection reset by peer)
>13:54:39.427052 close(8)                = 0
>13:54:39.427381 open("/dev/console", O_WRONLY|O_NOCTTY) = 8
>
>It hangs on the send call to /dev/log. I don't know if this is a bind,
>kernel /UNIX domain socket) or syslog problem, but other logging to the
>syslogd seems to work fine.
>Restarting sysklogd won't help. Only SIGKILL and restart to named will fix
>the situation.

Correction! Killing the syslogd will make the "hang" release. This means
this is most probably not a bind problem. Thanks for your help.

>Any hints are appreciated. We are running standard RedHat 6.1 2.2.12-20
>SMP kernel with the sysklogd-1.3.31-14 package.

-- Janne
------------- ÅLCOM ------------- Network Operations Center ---------
Jan-Erik Eriksson		mailto: jee at alcom.aland.fi
ÅLCOM				phone: +358 18 23500
PB 233, Torggatan 10		fax: +358 18 14643
FIN-22100 Mariehamn		URL: http://www.alcom.aland.fi



More information about the bind-users mailing list