8.2.3 - maybe a problem

Thu Jul 13 09:17:01 UTC 2000

    Date:        Wed, 05 Jul 2000 15:19:55 +1000
    From:        Robert Elz <kre at munnari.oz.au>
    Message-ID:  <27221.962774395 at mundamutti.cs.mu.OZ.AU>

  | The next time I catch it in this state (assuming it happens again)
  | I will see if I can figure out what is happening - SIGCHLD getting
  | itself permanently blocked is is certainly a possibility, though
  | actually discovering that has happened might not be real easy.

I think I know what the problem is now, and it isn't SIGCHLD being blocked.
Or more correctly, it isn't SIGCHLD being blocked by the OS anyway, it is
(effectively) being blocked by named.

Munnari is not (to say the least) overpowered for the job it does,
it has been known to have 30KB UDP queues (or more) on each of its
interfaces (for the DNS sockets - other than 127.0.0.1).

The way the code to deal with stuff is structured (from 8.2.3t4b) is ...

        while (!main_needs_exit) {
                evEvent event;

                ns_debug(ns_log_default, 15, "main loop");
                if (needs != 0) {
                        /* Drain outstanding events; handlers ~block~. */
                        while (evGetNext(ev, &event, EV_POLL) != -1)
                                INSIST_ERR(evDispatch(ev, event) != -1);
                        INSIST_ERR(errno == EINTR || errno == EWOULDBLOCK);
                        handle_need();
                } else if (evGetNext(ev, &event, EV_WAIT) != -1) {
                        INSIST_ERR(evDispatch(ev, event) != -1);
                } else {
                        INSIST_ERR(errno == EINTR);
                }
        }

If there's a child waiting to exit, "needs" will be != 0.   The effect of
this is that if there's an exited child (or other stuff like that), named
goes into the:

                        while (evGetNext(ev, &event, EV_POLL) != -1)
                                INSIST_ERR(evDispatch(ev, event) != -1);

loop.  On munnari though, it is entirely possible for that loop to never
terminate, queries arrive as fast as munnari is able to process them (or 
faster).  In that case, handle_need() is never actually called... (or not
until there gets to be a lull in the packet processing, which can easily be
hours away).

During all of this timers are going off (that's handled by evGetNext())
new child processes are being created, they're exiting, and the process
table is getting forever fuller (and on munnari, available swap space is
continually decreasing).

I caught munnari in this state earlier today, it had accumulated about 10
zombie children when I found it, and had a total of about 40KB of UDP data
queued on its interfaces (ie, the select() calls in evGetNext() would never
have failed to find something).   I killed the mailers on munnari (the main
source of local resolver calls), and installed router filters to block
incoming DNS traffic, munnari's UDP queues almost instantly dropped to 0,
and all the zombie processed were waited for, and cleaned up (and a bunch of
zone transfers were started - given that in this state munnari was not able
to get UDP SOA replies from its primaries, and it is secondary for quite
a lot of domains.. in that state named starts a named_xfer to check the
status and transfer - that's TCP and worked fine).  All zombie processes
were nicely cleaned out.   Then I removed the filters, allowing DNS traffic
back in, munnari's UDP queues instantly jumped to something around 70KB
and zombies started being unreaped again.

Since the kill(-1, SIGTERM) bug was fixed when fork fails, after swap is
full, the longer term effect of this has been for one of named's malloc()
calls to eventually fail, which causes named to exit, which causes everything
to very quickly revert to normal, just without a named running until a new
one gets started.

I am currently about to try running named with the "drain" loop simply
deleted - I can't see anything it is actually necessary for, but then
again I also didn't look at what all the needs handlers do to make sure
there will be no adverse side effects.

Longer term, a faster munnari would probably be a good idea, but that isn't
quite so easy to arrange....

For named a better (more substantial) fix would probably to turn all of
the needs into events so everything could be processed using one mechanism
(perhaps bind9 has done that).

kre

ps: it looks to be as if CHECK_INSIST gets turned off, then INSIST_ERR()
turns into a no-op, which would cause evDispatch() to never be called,
ever, which doesn't look like a sane result.   Rather than

#define INSIST(cond)            ((void) 0)
#define INSIST_ERR(cond)        ((void) 0)

they should probably be ...

#define INSIST(cond)            ((void) (cond))
#define INSIST_ERR(cond)        ((void) (cond))

The same might be true of ENSURE() and INVARIANT() (etc) though I didn't
check any of their uses (these are all from include/isc/assertions.h)