BIND slow

Aruna msalveru at gmail.com
Tue Jul 15 22:36:50 UTC 2008


BIND 9.4.1-P1
Solaris 10
This is with cache only DNS server, after few hours 10 to 12 hours
 named starts responding very slowly from network. We gathered some
 stack traces from named.

  # pstack 7586
7586: /opt/Tbind/sbin/named
----------------- lwp# 1 / thread# 1 --------------------
fee54077 sigtimedwait (8047d10, 8047d20, 0)
fee49553 sigwait (8047df0) + 1c
fee41d86 __posix_sigwait (8047df0, 8047e90, fee7e5d0) + 2e
082294b1 isc_app_run (0, 0, 0, 0, 0, 0) + 1d1
00000064 ???????? ()

isc_app_run is NULL and it seems to be stepping on main().

----------------- lwp# 2 / thread# 2 --------------------
fee53d4b lwp_park (0, 0, 0)
fee4e592 cond_wait_queue (82f260c, 82f25d8, 0, 0) + 3b
fee4ea8b _cond_wait (82f260c, 82f25d8) + 66
fee4eacd cond_wait (82f260c, 82f25d8) + 21
fee4eb06 pthread_cond_wait (82f260c, 82f25d8, 0) + 1b
08225dff dispatch (fecf0200) + 9f
fee53cf0 _lwp_start (fecf0200, 0, 0, fee82280, fee82280, fed1bfb8)
----------------- lwp# 3 / thread# 3 --------------------
fee53d4b lwp_park (0, 0, 0)
fee4e592 cond_wait_queue (82f260c, 82f25d8, 0, 0) + 3b
fee4ea8b _cond_wait (82f260c, 82f25d8) + 66
fee4eacd cond_wait (82f260c, 82f25d8) + 21
fee4eb06 pthread_cond_wait (82f260c, 82f25d8, 0) + 1b
08225dff dispatch (fecf0a00) + 9f
fee53cf0 _lwp_start (fecf0a00, 0, 0, fee82280, fee82280, fecedfb8)
----------------- lwp# 4 / thread# 4 --------------------
fee53d4b lwp_park (0, 0, 0)
fee4e592 cond_wait_queue (82f260c, 82f25d8, 0, 0) + 3b
fee4ea8b _cond_wait (82f260c, 82f25d8) + 66
fee4eacd cond_wait (82f260c, 82f25d8) + 21
fee4eb06 pthread_cond_wait (82f260c, 82f25d8, 0) + 1b
08225dff dispatch (fecf1200) + 9f
fee53cf0 _lwp_start (fecf1200, 0, 0, fee82280, fee82280, fecdbfb8)
----------------- lwp# 5 / thread# 5 --------------------
fee53d4b lwp_park (0, 0, 0)
fee4e592 cond_wait_queue (82f260c, 82f25d8, 0, 0) + 3b
fee4ea8b _cond_wait (82f260c, 82f25d8) + 66
fee4eacd cond_wait (82f260c, 82f25d8) + 21
fee4eb06 pthread_cond_wait (82f260c, 82f25d8, 0) + 1b
08225dff dispatch (fecf1a00) + 9f
fee53cf0 _lwp_start (fecf1a00, 0, 0, fee82280, fee82280, fecc9fb8)
----------------- lwp# 6 / thread# 6 --------------------
fee53d4b lwp_park (0, fecb7e90, 0)
fee4e592 cond_wait_queue (82f3608, 82f35d8, fecb7e90, 0) + 3b
fee4e932 cond_wait_common (82f3608, 82f35d8, fecb7e90) + 1df
fee4eb66 _cond_timedwait (82f3608, 82f35d8, fecb7ef4) + 51
fee4ebd1 cond_timedwait (82f3608, 82f35d8, fecb7ef4) + 24
fee4ec0d pthread_cond_timedwait (82f3608, 82f35d8, fecb7ef4) + 1e
0823bffe isc_condition_waituntil (4, 4d580000, 0, 0, 0, 0) + 7e
082e26b8 ???????? ()
----------------- lwp# 7 / thread# 7 --------------------
fee54727 pollsys (feca5af0, 16, 0, 0)
fee02f4e pselect (48, feca5dac, feca5e2c, fee7f310, 0, 0) + 18e
fee03244 select (48, feca5dac, feca5e2c, 0, 0, fffffffe) + 82
08234ef3 watcher () + 253


ipFragOKs = 2788 ipFragFails = 0
ipFragCreates = 5758 ipRoutingDiscards = 0
tcpInErrs = 5636 udpNoPorts =326866
udpInCksumErrs = 1502 udpInOverflows = 83605
rawipInOverflows = 0 ipsecInSucceeded = 0
ipsecInFailed = 0 ipInIPv6 = 0
ipOutIPv6 = 0 ipOutSwitchIPv6 = 0


Tue Jul 8 04:43:29 2008| WARNING: nge0:
nge_factotum_stall_check,tx_stall: tx_free: 2784,tx_next: 342,watchdog:
3959964680
Tue Jul 8 04:43:31 2008| NOTICE: nge0: link down (initialised)
Tue Jul 8 04:43:32 2008| NOTICE: nge0: link up 100Mbps Full-Duplex
(initialised)
Wed Jul 9 07:22:22 2008|
| panic[cpu1]/thread=ffffffff8225a160:
Wed Jul 9 07:22:22 2008| forced crash dump initiated at user request
Wed Jul 9 07:22:22 2008|
|
Wed Jul 9 07:22:22 2008| fffffe800059de70 genunix:kadmin+4b4 ()
Wed Jul 9 07:22:22 2008| fffffe800059dec0 genunix:uadmin+c7 ()
Wed Jul 9 07:22:22 2008| fffffe800059df10 unix:brand_sys_syscall32+1a3 ()
Wed Jul 9 07:22:22 2008|
Wed Jul 9 07:22:22 2008| syncing file systems...
Wed Jul 9 07:22:23 2008| done
Wed Jul 9 07:22:24 2008| dumping to /dev/dsk/c3t2d0s1, offset
1719074816, content: kernel

I have collected some core files with named and stack traces. There are
significant number of udpInOverflows, during the hang.
Once we restart named everything starts working again.

I have core files and related libraries from the system.


Initially, we were suspecting network interface nge, but this may be side
effect of some
udp stack layer contention.. Are there any specific known issue with ISC
BIND?
I am looking for any further ideas or known bugs around this.

Thanks in Advance,
Mani




More information about the bind-users mailing list