8.2.3 - maybe a problem

Paul A Vixie vixie at mibh.net
Mon Jul 17 08:27:37 UTC 2000


> The way the code to deal with stuff is structured (from 8.2.3t4b) is ...
> 
>         while (!main_needs_exit) {
>                 evEvent event;
> 
>                 ns_debug(ns_log_default, 15, "main loop");
>                 if (needs != 0) {
>                         /* Drain outstanding events; handlers ~block~. */
>                         while (evGetNext(ev, &event, EV_POLL) != -1)
>                                 INSIST_ERR(evDispatch(ev, event) != -1);
>                         INSIST_ERR(errno == EINTR || errno == EWOULDBLOCK);
>                         handle_need();
>                 } else if (evGetNext(ev, &event, EV_WAIT) != -1) {
>                         INSIST_ERR(evDispatch(ev, event) != -1);
>                 } else {
>                         INSIST_ERR(errno == EINTR);
>                 }
>         }
> 
> If there's a child waiting to exit, "needs" will be != 0.   The effect of
> this is that if there's an exited child (or other stuff like that), named
> goes into the:
> 
>                         while (evGetNext(ev, &event, EV_POLL) != -1)
>                                 INSIST_ERR(evDispatch(ev, event) != -1);
> 
> loop.  On munnari though, it is entirely possible for that loop to never
> terminate, queries arrive as fast as munnari is able to process them (or 
> faster).  In that case, handle_need() is never actually called... (or not
> until there gets to be a lull in the packet processing, which can easily be
> hours away).
> 
> During all of this timers are going off (that's handled by evGetNext())
> new child processes are being created, they're exiting, and the process
> table is getting forever fuller (and on munnari, available swap space is
> continually decreasing).

This is certainly a bad state of affairs and I'm embarrassed not to have
considered it in the original design.

> I am currently about to try running named with the "drain" loop simply
> deleted - I can't see anything it is actually necessary for, but then
> again I also didn't look at what all the needs handlers do to make sure
> there will be no adverse side effects.

The problem is calling evGetNext() with EV_WAIT if needs!=0.  EV_WAIT on a
nonbusy system can mean "wait for several minutes" ('til the next request
comes in or timer goes off or whatever).  Because BIND8's event system is
nonpreemptive, there's no clean way to code this.  The evHold()/evUnhold()
logic for "connections" is a rough analogue to what we'd need -- I don't
want to evDeselect() on all UDP sockets whenever I was about to call
handle_need().  What that EV_POLL loop is trying to get at is all the
events which pertain to things handle_need() generates or consumes.  I've
thought of an EV_NOIO which would only check for timers and evDo()'s.

The inability of BIND8's event system to ensure fairness and avoid starvation
was one of the design forces behind BIND9's event system.  So, I'm not going
to get involved with the necessary rewrite of BIND8 since it's already been
done and its name is BIND9.  I *think* the following will mostly work for
munnari, without causing any harm to anyone else.

*** ns_main.c   2000/04/21 06:54:08     8.125
--- ns_main.c   2000/07/17 08:22:59
***************
*** 530,538 ****
  
                ns_debug(ns_log_default, 15, "main loop");
                if (needs != 0) {
                        /* Drain outstanding events; handlers ~block~. */
!                       while (evGetNext(ev, &event, EV_POLL) != -1)
                                INSIST_ERR(evDispatch(ev, event) != -1);
                        INSIST_ERR(errno == EINTR || errno == EWOULDBLOCK);
                        handle_need();
                } else if (evGetNext(ev, &event, EV_WAIT) != -1) {
--- 530,545 ----
  
                ns_debug(ns_log_default, 15, "main loop");
                if (needs != 0) {
+                       int poll_max = 42;      /* Thank you Doug Adams. */
+ 
                        /* Drain outstanding events; handlers ~block~. */
!                       while (evGetNext(ev, &event, EV_POLL) != -1) {
                                INSIST_ERR(evDispatch(ev, event) != -1);
+                               if (--poll_max == 0) {
+                                       errno = EWOULDBLOCK;
+                                       break;
+                               }
+                       }
                        INSIST_ERR(errno == EINTR || errno == EWOULDBLOCK);
                        handle_need();
                } else if (evGetNext(ev, &event, EV_WAIT) != -1) {

> For named a better (more substantial) fix would probably to turn all of
> the needs into events so everything could be processed using one mechanism
> (perhaps bind9 has done that).

Yes, and yes.  evDo() can allocate memory, which means it cannot be called
from a signal handler on some systems, which is why we have a "needs" mask.
BIND9 avoids this whole area by having a cleaner design everyplace else --
so, this problem just never arises.

> ps: it looks to be as if CHECK_INSIST gets turned off, then INSIST_ERR()
> turns into a no-op, which would cause evDispatch() to never be called,
> ever, which doesn't look like a sane result.   Rather than
> 
> #define INSIST(cond)            ((void) 0)
> #define INSIST_ERR(cond)        ((void) 0)
> 
> they should probably be ...
> 
> #define INSIST(cond)            ((void) (cond))
> #define INSIST_ERR(cond)        ((void) (cond))
> 
> The same might be true of ENSURE() and INVARIANT() (etc) though I didn't
> check any of their uses (these are all from include/isc/assertions.h)

Done.  Thanks.





More information about the bind-workers mailing list