Newbie: Tuning Bind 8.2.3 for resilient A/CNAME queries...

Thu Apr 12 19:56:01 UTC 2001

I don't think the 10% figure is particularly meaningful, unless that only
represents *authoritative* responses. A 10% differential which includes
answers from cache could occur simply because cache entries in the offsite
server might happen to expire at different times than they do in yours.

I'm not aware of any tunables in BIND for this kind of thing, but then
mostly BIND just keeps on trying and trying to resolve a query and it is
the client which times out. You might want to do some analysis to see if
BIND actually *is* resolving the "failed" names, but too late for the
client to accept the answer. For this, you'd probably have to turn on
debugging and parse the output (and at the volumes you're talking about,
that could be a *lot* of parsing). If BIND is getting the answer, then you
should perhaps concentrate your efforts on getting the client resolvers to
be more patient.

- Kevin

jschultz wrote:

> I have the following problem:
>
> I am using a cluster of linux boxes each running 8.2.3 to support tens
> to hundreds of dns lookups (A/CNAME lookups) per second from local (on
> the same machine) client applications. Many of these lookups are for
> "far away" domains, such as domains in China, Korea and Europe (I'm in
> the US).
>
> Naturally, many dns lookups correctly fail due to No Such Host
> responses, while many fail due to timeouts. When I take a large list of
> failed lookups to an offsite server (though still pretty close on AT&T's
> network) it is able to find something like 10% of the dns lookups that
> failed on the cluster. I would hope (I haven't checked for sure) that
> the ones that the offsite DNS server finds failed on my cluster due to
> timeouts. Anyway, what gives?
>
> Is a 10% rate of disagreement on failures between DNS servers
> usual/expected?
>
> Is this most likely due to problems with BIND, the clients' library
> (ADNS) or my cluster's network connectivity?
>
> How can I make these dns lookups more accurate (i.e. - less timeout
> failures)?
>
> I was thinking if I could make my BIND daemons use longer timeouts for
> responses from other nameservers and also if it re-requests more often,
> then this could lead to less false-negatives. Is this approach
> advisable? I looked at the 8.2.3 documentation and I didn't see any
> obvious configuration options to do this. Could I do this at the
> code/recompile level? If so, where should I look in the code?
>
> Also, if I make BIND more reliable but now it uses longer timeouts do
> most client libraries wait forever for responses from named or do they
> have their own timeouts + retries which if they fail they eventually
> give up on the local named? If so, would I have to also change these
> timeouts to get the increased resilience or is there usually enough
> slack in client libraries' timeouts?
>
> Thanks for any help in advance,
>
> John Schultz
> Research Assistant
> The Center for Networking and Distributed Systems
> The Johns Hopkins University