> A couple of weeks ago, we experienced an outage on our external
> Internet links.  Ideally, this shouldn't affect queries for internal
> resources - we expect those queries to continue to be answered.
> That being said, we saw a bunch of messages in our logs such as:
> client no more recursive clients (1000/0/1000): quota reac
> hed
> It's my understanding that by default, BIND limits the number of
> concurrent recursive queries to 1000, so obviously during these
> situations, we need to raise our client limit (recursive-clients) to
> deal with this.
> What I'm curious about is how BIND behaves when it can't finish
> iterative queries: when someone queries for yahoo.com, and the root
> (or .com, yahoo.com) nameservers aren't reachable, does BIND then
> issue a SERVFAIL response (assuming yes)?
> How long will BIND wait before returning SERVFAIL?
> At what point does BIND assume a domain is down altogether?  What's
> the behavior then?
> In other words, how do we keep ourselves from being overwhelmed with
> unanswerable queries during a network outage?

Named will still answer from the cache and from configured zones.
If your external link is down it doesn't matter if the recursive
clients build up as they can't get an answer anyway.  Internal zones
should be available to named without needing to recurse.  Slave
your internal zones.

Recursive clients needs to be greater than ~10 x the number of
external <qname,qtype> being concurrently looked up to have spare
client slots for internal recursive queries.  Named will consolidate
lookups and then reject additional lookups once the count of the
lookup exceeds the learnt concurrency count (clients-per-query,
initially 10) which named logs as it is adjusted.

The actual value will depend on what your client population looks
up and how often they query.  Pointing mail servers at their own
recursive server helps as they lookup up lots of external names
where as humans tend to stop doing external lookups when they know
the external links are down.


