bind-9.4.0b2 exits unexpected...

Tue Oct 10 14:46:02 UTC 2006

>>>>> On Tue, 10 Oct 2006 14:00:57 +0200, 
>>>>> Marco Schumann <schumann at strato-rz.de> said:

> no, they don't, as I recompiled on that machine with --disable-atomic...
> sorry, I did not check.
> I copied the core dumps to another machine running the version which
> crashed. Here again the backtrace:

Okay, thanks.  It looks like a valid trace.  And according to that,
the other thread was running in a context that could not affect the
assertion failure.  So I guess the real source of the problem is
somewhere different from the assertion point.

Now I'd like to ask you a couple of more things if you don't mind.

First, I'd like to know more details about the query that caused this
error.

> (gdb) thr 1
> [Switching to thread 1 (process 13242)]#0  0xffffe410 in
> __kernel_vsyscall ()
> (gdb) bt
> #0  0xffffe410 in __kernel_vsyscall ()
> #1  0xb7b747d0 in raise () from /lib/libc.so.6
> #2  0xb7b75ea3 in abort () from /lib/libc.so.6
> #3  0x08064b42 in assertion_failed (file=0xb7f3ca11 "rbtdb.c",
> line=1158, type=isc_assertiontype_require, cond=0xb7f2ee45 "prev > 0")
>     at ./main.c:159
> #4  0xb7e87918 in no_references (rbtdb=0xadd16008, node=0x85fff2d8,
> least_serial=0, lock=isc_rwlocktype_none) at rbtdb.c:1157
> #5  0xb7e90367 in detachnode (db=0xadd16008, targetp=0xb4292628) at
> rbtdb.c:3854
> #6  0xb7e4ba6e in dns_db_detachnode (db=0xadd16008, nodep=0xb4292628) at
> db.c:525
> #7  0xb7ee20b0 in cache_message (fctx=0xab7dc7e8, addrinfo=0xa9bd8138,
[...]

Again, if you don't mind, please try the following and show the
results:

(gdb) f 7
(gdb) p *fctx
(gdb) p (unsigned char *)((dns_rbtnode_t *)node + 1)

NOTE: this will disclose the query name, type and other contexts of
the query, which may not want to show others.  Please first check the
output, and paste it only when you are sure it's okay.

Second, if you can run test code, please apply the attached patch to
9.4.0b2, rebuild name (do NOT specify --disable-atomic), and see if
the bug is reproduced.  The patch does not include any fix, but some
stronger assertion checks that may reveal the real point of the bug.

Thanks,

					JINMEI, Tatuya
					Communication Platform Lab.
					Corporate R&D Center, Toshiba Corp.
					jinmei at isl.rdc.toshiba.co.jp

--- rbtdb.c.orig	Tue Oct 10 23:35:40 2006
+++ rbtdb.c	Tue Oct 10 23:36:58 2006
@@ -837,8 +837,8 @@
 	REQUIRE(version->writer);
 
 	if (changed != NULL) {
-		dns_rbtnode_refincrement0(node, &refs);
-		INSIST(refs > 0);
+		dns_rbtnode_refincrement(node, &refs);
+		INSIST(refs != 0);
 		changed->node = node;
 		changed->dirty = ISC_FALSE;
 		ISC_LIST_INITANDAPPEND(version->changed_list, changed, link);
@@ -1125,6 +1125,7 @@
 		isc_refcount_increment0(lockref, &lockrefs);
 		INSIST(lockrefs != 0);
 	}
+	INSIST(isc_refcount_current(&rbtdb->node_locks[node->locknum].references));
 	INSIST(noderefs != 0);
 }
 
@@ -3824,8 +3825,8 @@
 	REQUIRE(targetp != NULL && *targetp == NULL);
 
 	NODE_STRONGLOCK(&rbtdb->node_locks[node->locknum].lock);
-	dns_rbtnode_refincrement0(node, &refs);
-	INSIST(refs > 1);
+	dns_rbtnode_refincrement(node, &refs);
+	INSIST(refs != 0);
 	NODE_STRONGUNLOCK(&rbtdb->node_locks[node->locknum].lock);
 
 	*targetp = source;
@@ -4285,8 +4286,8 @@
 
 	NODE_STRONGLOCK(&rbtdb->node_locks[rbtnode->locknum].lock);
 
-	dns_rbtnode_refincrement0(rbtnode, &refs);
-	INSIST(refs > 0);
+	dns_rbtnode_refincrement(rbtnode, &refs);
+	INSIST(refs != 0);
 
 	iterator->current = NULL;
 
@@ -6332,9 +6333,12 @@
 		 * expirenode() currently always returns success.
 		 */
 		if (expire_result == ISC_R_SUCCESS && node->down == NULL) {
+			unsigned int refs;
+
 			rbtdbiter->deletions[rbtdbiter->delete++] = node;
 			NODE_STRONGLOCK(&rbtdb->node_locks[node->locknum].lock);
-			dns_rbtnode_refincrement0(node, NULL);
+			dns_rbtnode_refincrement(node, &refs);
+			INSIST(refs != 0);
 			NODE_STRONGUNLOCK(&rbtdb->node_locks[node->locknum].lock);
 		}
 	}