weird 8.2.1 crash

Jim Reid jim at mpn.cp.philips.com
Mon Aug 9 16:43:03 UTC 1999


One of our 8.2.1 name servers died with a weird SIGSEGV error when it
was doing its nightly "ndc reload" to cycle all its log files. The
core dump doesn't make a lot of sense to me.

	(gdb) where
	#0  0x805a06e in ns_init (conffile=0x80d0002 "/etc/named.conf")
	    at ns_init.c:172
	#1  0x8060750 in ns_reconfig () at ns_maint.c:1576
	#2  0x805e0df in handle_need () at ns_main.c:2599
	#3  0x805b1fd in main (argc=1, argv=0x8047c98, envp=0x8047c9c) at ns_main.c:517
	#4  0xa0000479 in ?? ()
	(gdb) p nzones
	$8 = 135610368
	(gdb) p *zp
	$9 = {z_origin = 0x80dd3da "89.161.in-addr.arpa", z_time = 0, 
	  z_lastupdate = 0, z_refresh = 0, z_retry = 3600, z_expire = 604800, 
	  z_minimum = 86400, z_serial = 1999080600,
		.....
	 z_fwdtab = 0x0, z_freelink = {prev = 0x0, next = 0x8121f30}, z_reloadlink = {
	    prev = 0x0, next = 0xffffffff}}

Has anyone got an idea how nzones got to >> 135 million? There are
only 784 zones in its named.conf.

	(gdb) p reloadingzones
	$10 = {head = 0x0, tail = 0x3500}
	(gdb) p freezones
	$11 = {head = 0x0, tail = 0x3780dc31}

More worryingly, how come both reloadingzones and freezones ended up
with null pointers at the head of their lists? => null pointer derefs
=> SIGSEGV? The addresses of the tail pointers in those lists look
wacky too: as well as having suspicious alignment, they seem to be in
the text segment of named's address space rather than the data
segment.

I see the same weirdness with nzones and these two lists when I gdb a
gcore'd dump of the restarted name server. However when I take a copy
of the operational server's named.conf to a test system, fire up named
and gcore that, a cursory glance suggests the data structs look OK:

	# gdb named test-server-gcore.dump
	 ...
	Core was generated by `named'.
	#0  0xa0025961 in ?? ()
	(gdb) p nzones 
	$1 = 832
	(gdb) p freezones 
	$2 = {head = 0x8194184, tail = 0x8197bbc}
	(gdb) p reloadingzones 
	$3 = {head = 0x0, tail = 0x0}

Any clues or constructive suggestions are welcome. :-)

FWIW, my test and operational systems run the same OS (BSD/OS4.0) and
8.2.1 name server executables.


More information about the bind-workers mailing list