bind-9.4.0b2 exits unexpected...

Mon Oct 9 08:16:21 UTC 2006

Hello,

JINMEI Tatuya / 神明達哉 schrieb:
>>>>>> On Sun, 08 Oct 2006 21:03:23 +0200, 
>>>>>> Marco Schumann <schumann at strato-rz.de> said:
> 
>> another mystic(?) bind-9.4.0b2 behaviour.
>> Last two nights one of our resolvers stopped working:
> 
>> ...
>> 07-Oct-2006 21:03:29.393 general: rbtdb.c:1158: REQUIRE(prev > 0) failed
>> 07-Oct-2006 21:03:29.393 general: exiting (due to assertion failure)
>> ...
>> 08-Oct-2006 19:32:31.666 general: rbtdb.c:1158: REQUIRE(prev > 0) failed
>> 08-Oct-2006 19:32:31.666 general: exiting (due to assertion failure)
>> ...
> 
>> What happens exactly? isc_refcount_decrement returns NULL when... when?
>> And why is it so fatal, that the whole process must die? Was this
>> introduced in this version? In bind-9.3.2-P1 the macro was called only
>> once in that file, bind-9.4.0b2 executes it 7 times. Does this behaviour
>> correlate with the number of worker threads as it seems to be a locking
>> issue? And if so, which way?
> 
> Did named dump core?  If so, showing its backtrace would be helpful.

Here it is (I hope this is what you expected):

#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7b747d0 in raise () from /lib/libc.so.6
#2  0xb7b75ea3 in abort () from /lib/libc.so.6
#3  0x08064b42 in assertion_failed (file=0xb7f3ca11 "rbtdb.c",
line=1158, type=isc_assertiontype_require, cond=0xb7f2ee45 "prev > 0")
    at ./main.c:159
#4  0xb7e87918 in no_references (rbtdb=0xadd16008, node=0x85fff2d8,
least_serial=0, lock=isc_rwlocktype_none) at rbtdb.c:1157
#5  0xb7e90367 in detachnode (db=0xadd16008, targetp=0xb4292628) at
rbtdb.c:3854
#6  0xb7e4ba6e in dns_db_detachnode (db=0xadd16008, nodep=0xb4292628) at
db.c:525
#7  0xb7ee20b0 in cache_message (fctx=0xab7dc7e8, addrinfo=0xa9bd8138,
now=1160247809) at resolver.c:3924
#8  0xb7ee6e3d in resquery_response (task=0xaea84620, event=0xa9dab3c8)
at resolver.c:5741
#9  0xb7cbae12 in run (uap=0xb7a9e0b0) at task.c:867
#10 0xb7c7134b in start_thread () from /lib/libpthread.so.0
#11 0xb7c0965e in clone () from /lib/libc.so.6

#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7b397d0 in raise () from /lib/libc.so.6
#2  0xb7b3aea3 in abort () from /lib/libc.so.6
#3  0x08064b42 in assertion_failed (file=0xb7f01a11 "rbtdb.c",
line=1158, type=isc_assertiontype_require, cond=0xb7ef3e45 "prev > 0")
    at ./main.c:159
#4  0xb7e4c918 in no_references (rbtdb=0xadcdb008, node=0x89e821e8,
least_serial=0, lock=isc_rwlocktype_none) at rbtdb.c:1157
#5  0xb7e55367 in detachnode (db=0xadcdb008, targetp=0xb625c374) at
rbtdb.c:3854
#6  0xb7e556e1 in rdataset_disassociate (rdataset=0xa6ef88a8) at
rbtdb.c:5636
#7  0xb7e970ac in dns_rdataset_disassociate (rdataset=0xa6ef88a8) at
rdataset.c:100
#8  0xb7ea0444 in fctx_destroy (fctx=0xa6ef87e8) at resolver.c:2596
#9  0xb7ead4b4 in fctx_doshutdown (task=0xaea49278, event=0xa6ef8840) at
resolver.c:2745
#10 0xb7c7fe12 in run (uap=0xb7a630b0) at task.c:867
#11 0xb7c3634b in start_thread () from /lib/libpthread.so.0
#12 0xb7bce65e in clone () from /lib/libc.so.6

>> But most of all we are interested in how we can avoid that.
> 
> As you speculated, this is most likely a thread-related bug that is
> specific to 9.4.  So, if you need a quick remedy, I'd suggest to
> rebuild named with --disable-threads or --disable-atomic.  (The former
> should be obvious, and the latter disables optimization newly
> introduced in 9.4).

I will try with --disable-atomic, probably it happens again or not.

-- 
_____________________________
[Marco Schumann