8.2.3 - maybe a problem

Robert Elz kre at munnari.OZ.AU
Tue Jul 4 05:51:08 UTC 2000


I'm running 8.2.3-t4b (yes, I know, not the latest, but I have lots
of changes in my sources, so don't upgrade frequently - if someone
tells me this problem is fixed in t5b I will happily do the work to
upgrade).

The problem I'm seeing is that occasionally (say ever few days) my
named seems to decide to forget to clean up its children.  What's
more there appears to be a bug in the DUnix 3.2c (yes, truly ancient...)
that it is running on, which causes swap space for zombies to not be
released until after they have been reaped.   What a zombie is going
to do with large quantities of swap I haven't determined, but never
mind.

The effect is that the system reaches a stage where it forks fail as
there's no VM left.   Of itself that should not be a huge problem, and
with earlier versions of bind (and perhaps less swap space configured)
it used to "just happen" when too much was happening on the system
(a few thousand sendmail processes can cause it).

However, since I installed 8.2.3-t4b something seems to have decided
to do the equivalent of a kill(-1, SIGTERM) (as root) - and since named
is often (aside from init) the only process that survives (sometimes
named goes away too), my guess is that perhaps named is the process
doing the kill...   So far this is purely a guess, I am about to install
a named that protects the two kill() calls in ns_maint.c with a check
to verify that the pid about to be used is > 0 (named never wants to
signal a process group I think).

I am still at quite an early stage of actually debugging this (generally
it is more important to get the system running again than worry much about
what happened ...) so unless someone has seen this before and knows it
has been fixed, I am not expecting any responses, I will send more mail if
I ever discover the cause.

The point of this message is that I just read the mail I have had saved
from the bind-* lists for more than a year (I hadn't been near that mail
folder in that long), and see no mention of anything even remotely like this,
and also see that 8.2.3 was supposed to be shipped back neat the end of
January, and seems to be perhaps "any day now" and is also supposed to be
the final bind 8.   So, I just thought that perhaps you (all) ought to
know that there might be a problem (perhaps bind + OS version together).

kre





More information about the bind-workers mailing list