8.2.3 - maybe a problem

Mark.Andrews at nominum.com
Tue Jul 4 06:57:21 UTC 2000

	The following is awaiting inclusion to 8.2.3-REL / 8.2.3-TB6.


Index: src/bin/named/ns_maint.c
RCS file: /proj/cvs/isc/bind/src/bin/named/ns_maint.c,v
retrieving revision 8.103
diff -u -r8.103 ns_maint.c
--- ns_maint.c	2000/04/23 02:18:58	8.103
+++ ns_maint.c	2000/06/29 03:00:05
@@ -834,8 +834,10 @@
-	if ((pid = spawnxfer(argv, zp)) == -1)
+	if ((pid = spawnxfer(argv, zp)) == -1) {
+		return;
+	}
 	xferstatus[i].xfer_state = XFER_RUNNING;
 	xferstatus[i].xfer_pid = pid;  /* XXX - small race condition here if we
> I'm running 8.2.3-t4b (yes, I know, not the latest, but I have lots
> of changes in my sources, so don't upgrade frequently - if someone
> tells me this problem is fixed in t5b I will happily do the work to
> upgrade).
> The problem I'm seeing is that occasionally (say ever few days) my
> named seems to decide to forget to clean up its children.  What's
> more there appears to be a bug in the DUnix 3.2c (yes, truly ancient...)
> that it is running on, which causes swap space for zombies to not be
> released until after they have been reaped.   What a zombie is going
> to do with large quantities of swap I haven't determined, but never
> mind.
> The effect is that the system reaches a stage where it forks fail as
> there's no VM left.   Of itself that should not be a huge problem, and
> with earlier versions of bind (and perhaps less swap space configured)
> it used to "just happen" when too much was happening on the system
> (a few thousand sendmail processes can cause it).
> However, since I installed 8.2.3-t4b something seems to have decided
> to do the equivalent of a kill(-1, SIGTERM) (as root) - and since named
> is often (aside from init) the only process that survives (sometimes
> named goes away too), my guess is that perhaps named is the process
> doing the kill...   So far this is purely a guess, I am about to install
> a named that protects the two kill() calls in ns_maint.c with a check
> to verify that the pid about to be used is > 0 (named never wants to
> signal a process group I think).
> I am still at quite an early stage of actually debugging this (generally
> it is more important to get the system running again than worry much about
> what happened ...) so unless someone has seen this before and knows it
> has been fixed, I am not expecting any responses, I will send more mail if
> I ever discover the cause.
> The point of this message is that I just read the mail I have had saved
> from the bind-* lists for more than a year (I hadn't been near that mail
> folder in that long), and see no mention of anything even remotely like this,
> and also see that 8.2.3 was supposed to be shipped back neat the end of
> January, and seems to be perhaps "any day now" and is also supposed to be
> the final bind 8.   So, I just thought that perhaps you (all) ought to
> know that there might be a problem (perhaps bind + OS version together).
> kre
Mark Andrews, Nominum Inc.
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark.Andrews at nominum.com

