named 8.3.4 dies under recursive cache load (with ACL'ed zones?)

Greg A. Woods woods at weird.com
Sat Feb 15 17:39:38 UTC 2003


I've got a caching nameserver running at a client site which has been
crashing a lot more often at inopportune moments (i.e. when nobody's
nearby to restart it!).

It's running on NetBSD-1.5W on i386.

Feb 12 15:09:09 corporate named[3631]: flushset: out of memory
Feb 12 15:09:09 corporate named[3631]: flushset: out of memory
Feb 12 15:09:09 corporate /netbsd: named: pid 3631 [eid 32769:40, rid 32769:40] sent signal 6: was set-id, core dump not permitted [in /var/named]

Feb 14 07:37:02 corporate named[2928]: flushset: out of memory
Feb 14 07:37:02 corporate named[2928]: flushset: out of memory
Feb 14 07:37:02 corporate /netbsd: named: pid 2928 [eid 32769:40, rid 32769:40] sent signal 6: was set-id, core dump not permitted [in /var/named]

Feb 15 01:58:30 corporate named[8609]: savedata: memget
Feb 15 01:58:30 corporate named[8609]: savedata: memget
Feb 15 01:58:30 corporate /netbsd: named: pid 8609 [eid 32769:40, rid 32769:40] sent signal 6: was set-id, core dump not permitted [in /var/named]

As you can see I have no core dumps to examine ("was set-id" means the
process called set*id(), which of course it did since it was invoked
with '-u') -- I'd have to run the thing under the debugger to get any
more info and I'm not really happy to do that on this server.

This same config on the same machine ran a lot longer, and with only
rare and apparently different and unrelated kinds of failures (no fatal
signals), for quite some time.

The only thing new since these SIGBUS failures started has been that the
nameserver is now slave for a bunch more (~512) zones which are
protected from all but local networks by ACLs, and because some of these
zones are for reverse DNS of private IPs there are a lot of attempted
queries against them from unauthorised clients.

I note also that processing allow-query ACLs in zone statements causes
a lot of slow-down.  I'm guessing BIND-9 would handle this better but
I'm nowhere near ready to use BIND-9 in this scenario.

# /etc/rc.d/named status                                                                               
USER   PID %CPU %MEM    VSZ    RSS TT STAT STARTED     TIME COMMAND
dns  24945  8.8 19.7 103840 103172 ?? SNs   3:29AM 40:44.39 /usr/sbin/named -u dns -g dns -u dns -g dns 
named 8.3.4-REL-Planix-1 Mon Dec  2 13:15:29 EST 2002  root at starting-out:/work/pkgobj/net/bind8/work/src/bin/named
config (/etc/named.conf) last loaded at age: Thu Nov 28 04:21:45 2002 
number of zones allocated: 1088
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
server is up and running

Note the start time as shown above was 3:29AM today.

This should give you some idea of what kind of load it's under (this
sample being a bit light as it's from this Saturday morning, and even
the totals are only from a less busy time period):

Feb 15 10:28:29 corporate named[24945]: NSTATS 1045322909 1045297755 TYPE0=114 A=1051401 NS=265 CNAME=1275 SOA=6125 PTR=207155 MX=293042 TXT=6130 AAAA=900 SRV=2173 A6=139 ANY=1856
Feb 15 10:28:29 corporate named[24945]: XSTATS 1045322909 1045297755 RR=1189439 RNXD=563436 RFwdR=412042 RDupR=1999 RFail=14418 RFErr=57095 RErr=1978 RAXFR=0 RLame=17508 ROpts=0 SSysQ=603498 SAns=2019822 SFwdQ=260265 SDupQ=276757 SErr=6 RQ=1570577 RIQ=0 RFwdQ=260265 RDupQ=13400 RTCP=2934 SFwdR=412042 SFail=21768 SFErr=0 SNaAns=935268 SNXD=426864 RUQ=6930 RURQ=0 RUXFR=0 RUUpd=0

Feb 15 11:28:29 corporate named[24945]: NSTATS 1045326509 1045297755 TYPE0=123 A=1260582 NS=298 CNAME=1464 SOA=7029 PTR=244224 MX=314963 TXT=6791 AAAA=1047 SRV=2568 A6=165 ANY=2075
Feb 15 11:28:29 corporate named[24945]: XSTATS 1045326509 1045297755 RR=1378624 RNXD=656436 RFwdR=477477 RDupR=2339 RFail=15832 RFErr=63319 RErr=2106 RAXFR=0 RLame=20289 ROpts=0 SSysQ=702316 SAns=2370213 SFwdQ=301321 SDupQ=314682 SErr=7 RQ=1841331 RIQ=0 RFwdQ=301321 RDupQ=14166 RTCP=3406 SFwdR=477477 SFail=24151 SFErr=0 SNaAns=1090439 SNXD=519068 RUQ=8022 RURQ=0 RUXFR=0 RUUpd=0

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods at ieee.org>;           <woods at robohack.ca>
Planix, Inc. <woods at planix.com>; VE3TCP; Secrets of the Weird <woods at weird.com>


More information about the bind-workers mailing list