How to prevent slaves from contacting master for name resolution?

Mon May 17 23:24:18 UTC 2010

On 5/17/2010 5:58 PM, Keith Christian wrote:
> Our redundant DNS configuration is one master and three slaves, spread
> across two colo facilities.
>
> master and slave1 are in colo_ALPHA.
> slave2 and slave3 are in colo_BETA.
>
> During an extended maintenance window, the master DNS was offline.
> Slave2 was trying to contact the master, and lookups failed.  Usually,
> slave2 resolves without contacting the master, but occasionally it
> does.
>
> The IP for the master does not appear in slave2's /etc/resolv.conf,
> and I'm not sure what else to check for on slave machines.  Where else
> would I look?  Would any settings in named.conf account for this
> behavior?
>
> Versions are Linux (CentOS 5) and BIND 9.5.x.
>    

These queries that were failing, were they queries from external 
clients, queries you were generating on the slave nameserver itself, or 
some other queries entirely?

If queries from external clients were failing while the master was down 
and the slaves were up, then the most likely cause is that your NS 
records are screwed up, such that your slaves aren't being found. Either 
you have the wrong names in your delegation NS records, or those at the 
apex of the zone, missing/incorrect glue in the parent zone, something 
like that.

If queries you were generating *locally* were failing, the questions are:
a) are you quering the slave directly?
b) does the slave have recursion turned completely off?

If you're querying the slave directly and recursion is off, and the 
queries are failing, then this should have nothing whatsoever with the 
master being down, since the data comes *only* from the slave. You 
should look at whether the zone is being loaded correctly on the slave, 
a corrupted journal file, expired zone, something like that.

If you're querying the slave directly and recursion is *on* for the 
slave, then you should check whether the slave is actually responding 
authoritatively for the zone in question under normal circumstance. If 
not, then something is misconfigured, the slave isn't really a slave at 
all, under normal circumstances it's just recursing to get the 
information, and it makes sense that when the master (and/or other 
slaves) are unavailable, queries will fail.

If you're querying some other box, then you'd need to look at that other 
box into why the queries were failing. At that point, the fact that 
you're generating queries from the slave itself has no bearing on the 
problem, since the failed queries weren't going to the slave in the 
first place.

Another possibility: were you using "nslookup" to test queries, and does 
the master for your forward zone(s) also happen to be authoritative for 
the *reverse* zone which contains the address of your resolver? If so, 
then be aware that "nslookup" has the annoying feature of trying to 
reverse-resolve the name of the resolver it's using. So, maybe the 
queries you're seeing going from slave to master are *reverse* DNS 
queries, and if they're failing because the master is down, "nslookup", 
in its quaint, eccentric way, may be misreporting this as a general 
lookup failure, thus making you think that the forward name is unresolvable.

                                                     - Kevin