Alternatives to BIND?

Fri Jun 24 09:17:07 UTC 2005

Auer, Karl James writes:
>> From: John Wobus [mailto:jw354 at cornell.edu]
>> One thing to check is whether you are losing incoming UDP
>> packets.  Some OSs show counts of various kinds of packet
>> losses and have parameters that allow you to enlarge buffers.

The suggestion to look at UDP packet drops (netstat -s is your
friend), and possibly sizing up the UDP receive buffers for BIND's
sockets is a good one.

> Thanks, but in our case the losses are very clearly related to cache
> cleaning and downloading zones. Maybe the downloads could cause lost
> UDP packets?

Yes; but that packet loss probably doesn't happen where you expect
them to - you are probably thinking of the download *traffic* filling
up a pipe somewhere, so that incoming request packets are dropped,
right?

What I consider much more likely is that, during the transfer of large
zones or during cache cleanup, BIND is unable to process incoming
requests for an extended period of time, so incoming UDP packets are
queueing up in a kernel buffer until BIND starts processing them
again.  When the rate of incoming requests is high, and BIND's busy
periods are long enough, then that buffer will overflow.  The UDP
input buffer can normally be tuned at the system level (on my system
I'd use "ndd -set /dev/udp udp_recv_hiwat") and per-socket using
setsockopt(s,SOL_SOCKET,SO_RVCBUF,n).

Note that I'm slightly simplifying when I say "BIND is unable to
process incoming requests for an extended period of time".  With a
multi-threaded BIND this should never happen, right? Well, except when
locking or resource constraints block or slow down the thread(s?) that
process requests.  Note that for your problem to occur, it isn't
necessary for BIND to completely stop processing queries, it is
sufficient that the rate at which it processes them drops below the
query rate (long enough and low enough for the buffer to overflow).

> We aren't losing the notifies BTW: The notifies cause downloads to
> start and *then* we lose queries to the downloading servers. Ver
> briefly, a couple of seconds only.

Check whether "netstat -s" shows the counter for dropped UDP packets
increasing when this happens.  The counter might be called
"udpInOverflows" or some less obvious name - the UDP MIB (RFC 2013,
now RFC 4113) only has a "udpInErrors" counter that includes other
errors as well.

We had similar problems on one of our TLD nameservers: requests were
lost during large zone transfers.  The issue went away when we
replaced that (very old) machine with a faster box.
-- 
Simon.