Tuning for lots of SERVFAIL responses

Thu Feb 18 21:48:21 UTC 2016

Ah, so "recursive-clients" is the quota of queries that require named to recurse to get the answer, right? I was going to respond with the same advice -- slave your internal zones -- but then I somehow convinced myself that "recursive-clients" was merely the quota of concurrent RD=1 queries that named would handle, thus slaving wouldn't help in a network-outage situation, since named would still drop any new RD=1 query whenever the quota was full.

I concur with the 10x recommendation, and also the advice about mail servers. My mail servers -- at least, the ones that run on Linux -- are configured with local caching resolvers, due to the high volume and wide variety of lookups they generate. And the typical OS-level caching mechanisms (nscd, etc.) don't usually help much, I don't believe, since many of the lookups are for MX records which, AFAICT, nscd and friends don't cache.

												- Kevin

-----Original Message-----
From: bind-users-bounces at lists.isc.org [mailto:bind-users-bounces at lists.isc.org] On Behalf Of Mark Andrews
Sent: Thursday, February 18, 2016 4:08 PM
To: John Miller
Cc: Bind Users Mailing List
Subject: Re: Tuning for lots of SERVFAIL responses

In message <CAGYMsbvXCyWGYpzDM5xj4xJEq9=4=HEvJ2LzhbqB34vuQJLGEw at mail.gmail.com>
, John Miller writes:
> A couple of weeks ago, we experienced an outage on our external 
> Internet links.  Ideally, this shouldn't affect queries for internal 
> resources - we expect those queries to continue to be answered.
> 
> That being said, we saw a bunch of messages in our logs such as:
> 
> client 192.168.1.2#56075: no more recursive clients (1000/0/1000): 
> quota reac hed
> 
> It's my understanding that by default, BIND limits the number of 
> concurrent recursive queries to 1000, so obviously during these 
> situations, we need to raise our client limit (recursive-clients) to 
> deal with this.
> 
> What I'm curious about is how BIND behaves when it can't finish 
> iterative queries: when someone queries for yahoo.com, and the root 
> (or .com, yahoo.com) nameservers aren't reachable, does BIND then 
> issue a SERVFAIL response (assuming yes)?
> How long will BIND wait before returning SERVFAIL?
> At what point does BIND assume a domain is down altogether?  What's 
> the behavior then?
> 
> In other words, how do we keep ourselves from being overwhelmed with 
> unanswerable queries during a network outage?

Named will still answer from the cache and from configured zones.
If your external link is down it doesn't matter if the recursive clients build up as they can't get an answer anyway.  Internal zones should be available to named without needing to recurse.  Slave your internal zones.

Recursive clients needs to be greater than ~10 x the number of external <qname,qtype> being concurrently looked up to have spare client slots for internal recursive queries.  Named will consolidate lookups and then reject additional lookups once the count of the lookup exceeds the learnt concurrency count (clients-per-query, initially 10) which named logs as it is adjusted.

The actual value will depend on what your client population looks up and how often they query.  Pointing mail servers at their own recursive server helps as they lookup up lots of external names where as humans tend to stop doing external lookups when they know the external links are down.

Mark

> John
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to 
> unsubscribe  from this list
> 
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-users at lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users