BIND 8.4.4 assertion failure on Tru64

Mon Feb 2 22:39:34 UTC 2004

Hello,

I'm testing v8.4.4 on Tru64 4.0E, 4.0G, and 5.1A. On all of those,
it compiles (using defaults from port/decunix/Makefile.set) with
no errors, but when it is started, it frequently dies with log
entry:

insist: critical: ns_main.c:4439:
INSIST(evDo(ev, "handle_needs") != -1)
No such file or directory failed.

This usually happens very quickly after it has loaded all zones
and is listening for requests. It also frequently dies with the
exact same error immediately after "ndc reload", "ndc reload
<domain>", and "ndc reconfig". Occasionally it does keep running
without the INSIST error.

There are no errors in named.conf, and making changes to
named.conf (such as logging) has no affect on the issue. It does
not seem to matter how many zones are defined in named.conf - 2
or over 2000, the behavior is the same.

Production servers are currently running v8.4.3 with no problems
(we don't have ipv6 enabled on these servers yet, so the bug that
caused v8.4.3 to be deprecated isn't really bothering us).

Briefly comparing 8.4.3 code to 8.4.4, assertions.h has not
changed (the comments have, but not the code), but the way
INSIST_ERR() is coded does seem to have changed - for example
ns_main.c around line 4439 has:

v8.4.3 -
          if (queued != 0) {
                  INSIST_ERR(evDo(ev, (void *)handle_needs) != -1);
                  return;
          }

v8.4.4 -
          if (queued != 0) {
                  INSIST_ERR(evDo(ev, "handle_needs") != -1);
                  return;
          }

A colleague attempted a workaround by trying to force CHECK_INSIST
to zero. To include/isc/assertions.h he added
#define CHECK_INSIST 0
just above
#if CHECK_INSIST != 0
#define INSIST(cond) \
    ((void) ((cond) || \
       ((__assertion_failed)(__FILE__, __LINE__, assert_insist, \
            #cond, 0), 0)))
That was a mistake - On the test nameserver that was running
>2,000 slave zones (a number of which were pointing to bad master
servers), xfer-in seemed to get stuck: "ndc status" always showed
10 xfer's in progress (the max by default), with hundreds queued,
and zones were simply not getting updated. Apparently
CHECK_INSIST is, um, necessary :)

I'm not familiar enough with BIND's code (yet) to trace it much
further than that. In several years of maintaining ISC BIND
servers, this is the first time a bug has bitten me in rear, so
I'm not very familiar with debugging it. But I'll be submitting a
bug report as soon as I get full info from running in debug mode.
This is just a heads up.

If you're running v8.4.4 on Tru64 and *not* seeing this problem,
I'd sure like hear about it.

I'm also testing this on Solaris 8 - so far, no problems there.

Mark A Jones
Systems Administrator
netINS, Inc.   http://netins.net
(515) 830-0698   markjo at netins.net