BIND 9.6.1-P1 crashing

Mon Jan 4 22:34:31 UTC 2010

At Wed, 30 Dec 2009 10:23:17 +0100,
Dario Miculinic <dario.miculinic at t-com.hr> wrote:

> I'm administrating 4 DNS servers running CentOS release 5.4 and Red Hat Enterprise Linux Server release 5.2. with BIND 
> version 9.6.1-P1. On 3 of them BIND crashed 7 times in last 10 days. There's nothing in log files, but we have core dump 
> file. I found this in the core dump:
> 
> #0  0x080db986 in ttl_sooner (v1=0x0, v2=0x3385b628) at rbtdb.c:752
> 752     ttl_sooner(void *v1, void *v2) {
> (gdb) where
> #0  0x080db986 in ttl_sooner (v1=0x0, v2=0x3385b628) at rbtdb.c:752

What's the result of the following gdb command?

(gdb) thread apply all bt full

We've seen crash like this one, but we've not figured out how this
happens.  This is pretty likely an inter-thread race, and it may be
tricky.  According to the v1/v2 values in your stack trace, a full
backtrace with information of other threads may provide more useful
hint.

If you need immediate workaround rather than chasing the bug,
rebuilding named with --disable-atomic may help (we cannot be sure
because we don't yet know how this bug happens in the first place).
This will use locks in a more conservative way and may avoid the
tricky race condition at the cost of lower performance (so if you want
to try that you'll also need to watch the server load).

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.