BIND not answering queries while large zone loads

Fri Jan 16 01:41:32 UTC 2004

We have a number of large zones (most of them dnsbls). Some of the zones
are around 80-90 Mb in size. I've noticed some problems that *seem* to
corelate to the loading and / or transferring of large zones where BIND
is very slow or completely unresponsive for a minute or two. Does anyone
else see this problem, and if so, can anything be done about it? I've
finally been able to separate authoritative and caching-only functions
for the most part, and reducing the volume of queries seems to help a
little [1].

We've been having a lot of problems with the nameservers becoming
unresponsive for long periods of time and even occasionally forever.

This problem seems to come up even when BIND is built with threads;
generally I build it without.

We're running Debian Linux 3.0 on x86 hardware with 2.4.20 and 2.4.24
kernels; hardware ranges from single 550 PIII to dual 800 PIII to single
2Ghz P4; almost all of the machines have at least 1 Gb of memory. BIND
version is a mixture of 9.2.2-P3 and 9.2.3.

Also, is it a bad idea to have the refresh interval set to an identical
setting for a lot of zones (~ 60k) on the same server? We started
staggering our refresh interval after hypothesizing that our slaves
might be hitting the master with a whole lot of requests at once. I
assume that this is somewhat limited by the settings of trasnfers-in,
transfers-out, transfers-per-ns, etc., but is it a good idea to stagger
the refresh interval anyway?

[1] I've also been tweaking recursive-clients and tcp-clients in
named.conf to try and make sure we're not bumping up against the
default limits.

Lastly, if there is no good way to avoid this, should we try to keep all
the dnsbls on a separate machine and use forwarders to forward queries
to those machines? Should I give rbldnsd another look?

-- 
No copies, please.
To reply privately, simply reply; don't remove anything.