How to prevent slaves from contacting master for name resolution?

Tue May 18 01:11:09 UTC 2010

On May 17, 2010, at 7:24 PM, Kevin Darcy wrote:
> On 5/17/2010 5:58 PM, Keith Christian wrote:
>> Our redundant DNS configuration is one master and three slaves, spread
>> across two colo facilities.
>> 
>> master and slave1 are in colo_ALPHA.
>> slave2 and slave3 are in colo_BETA.
>> 
>> During an extended maintenance window, the master DNS was offline.
>> Slave2 was trying to contact the master, and lookups failed.  Usually,
>> slave2 resolves without contacting the master, but occasionally it
>> does.
>> 
>> The IP for the master does not appear in slave2's /etc/resolv.conf,
>> and I'm not sure what else to check for on slave machines.  Where else
>> would I look?  Would any settings in named.conf account for this
>> behavior?
>> 
>> Versions are Linux (CentOS 5) and BIND 9.5.x.
>>   
> 
> These queries that were failing, were they queries from external clients, queries you were generating on the slave nameserver itself, or some other queries entirely?
> 
> If queries from external clients were failing while the master was down and the slaves were up, then the most likely cause is that your NS records are screwed up, such that your slaves aren't being found. Either you have the wrong names in your delegation NS records, or those at the apex of the zone, missing/incorrect glue in the parent zone, something like that.
> 
> If queries you were generating *locally* were failing, the questions are:
> a) are you quering the slave directly?
> b) does the slave have recursion turned completely off?
> 
> If you're querying the slave directly and recursion is off, and the queries are failing, then this should have nothing whatsoever with the master being down, since the data comes *only* from the slave. You should look at whether the zone is being loaded correctly on the slave, a corrupted journal file, expired zone, something like that.
> 
> If you're querying the slave directly and recursion is *on* for the slave, then you should check whether the slave is actually responding authoritatively for the zone in question under normal circumstance. If not, then something is misconfigured, the slave isn't really a slave at all, under normal circumstances it's just recursing to get the information, and it makes sense that when the master (and/or other slaves) are unavailable, queries will fail.
> 
> If you're querying some other box, then you'd need to look at that other box into why the queries were failing. At that point, the fact that you're generating queries from the slave itself has no bearing on the problem, since the failed queries weren't going to the slave in the first place.
> 
> Another possibility: were you using "nslookup" to test queries, and does the master for your forward zone(s) also happen to be authoritative for the *reverse* zone which contains the address of your resolver? If so, then be aware that "nslookup" has the annoying feature of trying to reverse-resolve the name of the resolver it's using. So, maybe the queries you're seeing going from slave to master are *reverse* DNS queries, and if they're failing because the master is down, "nslookup", in its quaint, eccentric way, may be misreporting this as a general lookup failure, thus making you think that the forward name is unresolvable.

One other possibility: Have you configured the slaves to forward to the master, using a 'forwarders' statement, anywhere in named.conf?

Chris Buxton
BlueCat Networks