8.2.2-REL dying

Richard Johnsson johnsson at hamilton.com
Thu Oct 28 13:50:14 UTC 1999


> > My 8.2.2-REL has died a twice in the last two days. No core file and no log
> > messages. By looking at what I could find in the log I have found the
> > repeatable circumstances, but have not looked at the code at all.
> > 
> > Environment: Solaris (sparc) 2.5.1; compiled with gcc 2.7.2.3 with
> > default options.
> > 
> > If I do an ndc reload when nothing has changed the server logs this
> > 
> > Oct 27 23:12:35 interval named[7468]: reloading nameserver
> > Oct 27 23:12:35 interval named[7468]: Forwarding source address is
> > [0.0.0.0].49\
> > 582
> > Oct 27 23:12:35 interval named[7468]: Ready to answer queries.
> > 
> > and then disappears. If there's something new, even just touching a zone
> > file, it reloads the zone, sends NOTIFYs and does not die.
> > 
> > Richard
> > 
> > p.s. Just tried the same on SunOS 4.1.2 and Linux RH 6.0 and it works fine
> > there.
> > 
> 	Attach a debugger to the process and try a plain reload again.
> 
> 	e.g.
> 	    Window 1:
> 		gdb/dbx <namedpath> <pid>
> 		cont
> 
> 	    Window 2:
> 		ndc reload

I did it twice. In both cases I had to do the ndc reload twice. After the
first I verified that syslog said it was Ready and the process was running.
It crashed on the second reload. 


Experiment 1:

Attached to process 7729 at 0xef677790
poll+4: ta      8
(dbx) cont
signal BUS (bus error) in ip_match_addr_or_key at 0x4e980
ip_match_addr_or_key+0x38:      ld      [%i0], %l0
(dbx) where
ip_match_addr_or_key(0x732e636f, 0xc7aa6a25, 0, 0x1427e4, 0x10, 0xc3bac)
at 0x4e980
ip_match_address(0x732e636f, 0xeffff4b4, 0x1, 0x201, 0xef6b3164, 0x27) at
0x4eac8
dispatch_message(0xeffff550, 0x27, 0x200, 0, 0xeffff530, 0x18) at 0x353ec
datagram_read(0xeffff7fc, 0x105234, 0x18, 0x1, 0x351c4, 0x351c4) at 0x3537c
__evDispatch(0x109f40, 0xc96ac, 0xc96ac, 0x109f40, 0x2, 0xa4530) at 0x65de8
main(0, 0xeffff9f8, 0x1, 0xc3000, 0xb9380, 0xc3000) at 0x34244


Experiment 2:

Attached to process 21278 at 0xef677790
poll+4: ta      8
(dbx) cont
signal BUS (bus error) in ip_match_addr_or_key at 0x4e980
ip_match_addr_or_key+0x38:      ld      [%i0], %l0
(dbx) where
ip_match_addr_or_key(0x7a6f6e65, 0xc7aa6a60, 0, 0x1427e4, 0x10, 0xc3bac)
at 0x4e980
ip_match_address(0x7a6f6e65, 0xeffff4b4, 0x1, 0x201, 0xef6b3164, 0x27) at
0x4eac8
dispatch_message(0xeffff550, 0x27, 0x200, 0, 0xeffff530, 0x18) at 0x353ec
datagram_read(0xeffff7fc, 0x105234, 0x18, 0x1, 0x351c4, 0x351c4) at 0x3537c
__evDispatch(0x109f40, 0xc96ac, 0xc96ac, 0x109f40, 0x2, 0xa4530) at 0x65de8
main(0, 0xeffff9f8, 0x1, 0xc3000, 0xb9380, 0xc3000) at 0x34244


Here's the tail of named.run (trace level 5) at the time of failure

Ready to answer queries.
pselect(26, 0x3f00060, 0x0, 0x0, 3599.496772000)
select() returns 1 (err: none)
Dispatch.File: fd 24, mask 0x1, func 0x351c4, uap 0x105234
datagram from [199.170.106.10].57500, fd 24, len 45

Note 199.170.106.10 is the machine named is running on. The second argument
to ip_match_addr_or_key above is a random address on 199.170.106.*.


More information about the bind-workers mailing list