bind-9.4.0b2 throws SERVFAIL
schumann at strato-rz.de
Fri Oct 6 07:01:45 UTC 2006
after some trouble using bind-9.3.2 we decided to give bind-9.4.0b2 a
try as a resolver.
Appearantly bind-9.3.2 has a problem when the cache hits the
max-cache-size and needs to throw away outdated records or the cache
cleaner does its job (is this the same procedure?) while there are 3000+
recursive queries/s coming in on a single-threaded bind process compiled
with -DISC_MEM_USE_INTERNAL_MALLOC (Intel(R) Pentium(R) 4 CPU 3.06GHz,
2G RAM, linux 2.4.32 SMP kernel). From this time on the CPU is used 100%
while the UDP drop rate increases astronomically, independent of the
max-cache-size (tested 100M, 500M, 1300M, with cleaning-interval 5m,
This behaviour seems to be fixed in 9.4 as mentioned in the Release
notes as we haven't seen this in the short period we used this version
(9.4.0b2) on that hardware. Nevertheless, we still have seen a
significant amount of UDP drops.
Now we are running bind-9.4.0b2 with threading enabled and a
max-cache-size of 3072M (4G physical memory) (on Dual Core AMD
Opteron(tm) Processor 185). There are ~3000..4000q/s, no more UDP drops,
either processor core uses 50% average, we have 16 worker threads
enabled, the cleaning-interval is 15m. When the cache size hits
2.6...2.8G, bind stops recursing and throws SERVFAIL instead. In the
resolver logs we find entries "resolver: error: could not mark server as
lame: out of memory". It disappears when named is restarted.
We are using views, no datalimit is set. Is max-cache-size the size per
view or a global setting for all views? Or where does the "out of
memory" come from?
More information about the bind-users