NXDOMAIN returned on while updating

Thu Dec 21 22:42:17 UTC 2006

For those who are interested: we run BIND 9 with IP sub interfaces, ie
eth0:1.

When packets are sent to those sub interfaces this problem appears,
otherwise if sent to the master IP address bound to eth0 there is no
problem.  Has anyone else spotted this behaviour? 

> -----Original Message-----
> From: bind-users-bounce at isc.org 
> [mailto:bind-users-bounce at isc.org] On Behalf Of Nick Garfield
> Sent: Thursday, December 21, 2006 12:27 AM
> To: Kevin Darcy
> Cc: bind-users at isc.org
> Subject: RE: NXDOMAIN returned on while updating
> 
> FYI, I think this bug is somehow related to caching.  I 
> watched the cpu
> load on the server grow over the past two weeks, although there was no
> swapping to virtual memory.  However, as the server became more loaded
> the timeouts became worse, and NXDOMAINs more frequent.  Stopping the
> daemon and restarting seems to have fixed the problem (for 
> now).  I have
> dumped the cache to a file - it's about 110,000,000 lines / 33Mbytes.
> That does not seem so unreasonable and the contents is not 
> corrupted to
> the best of my knowledge.  Anyone got any experience of poor 
> performance
> from the caching system and how to fix it?
> 
> _Nick 
> 
> -----Original Message-----
> From: Kevin Darcy [mailto:kcd at daimlerchrysler.com] 
> Sent: Wednesday, December 20, 2006 10:24 PM
> To: Nick Garfield
> Cc: bind-users at isc.org
> Subject: Re: NXDOMAIN returned on while updating
> 
> Nick Garfield wrote:
> > Hi Kevin, Many thanks for your posting.
> >
> > Some comments for below, to get the picture of your system.
> >   
> >> I've never seen the behavior you described, even though we have a 
> >> similar environment, i.e. many Dynamically-updated zones, 
> a few big 
> >> ones that take a long time to transfer (e.g. an 87,000-record zone 
> >> that we transfer over the Atlantic).
> >>     
> > I presume you mean, like CERN, the large zones are not DDNS, and 
> > transfer by AXFR (not IXFR) - is that correct?
> >
> >   
> >> I think we would have noticed
> >> this problem
> >> a long time ago, since, as you point out, most apps will simply 
> >> *fail* when an erroneous NXDOMAIN is given for a name. 
> Admittedly, as
> 
> >> a general rule, we don't have ordinary end-user clients 
> querying our 
> >> master nameserver (it's pretty much dedicated to handling Dynamic 
> >> Updates and doing zone transfers)
> >>     
> > Exactly, same setup as we are using.  Our clients query the 
> slaves - 
> > it is the slaves that are showing the symptoms I described in the 
> > first email.
> >
> > Normal end user applications don't seem to be to concerned, 
> although 
> > SMTP lookups can fail leading to undeliverable emails.
> >
> > There are some CERN specific applications which suffer the worst - 
> > unfortunately these apps query the DNS 30 times per second (please 
> > don't comment on this, because there is nothing I can do except ask 
> > them to install a local caching server).
> >
> > However, you have given me an idea - see if the same 
> behaviour is seen
> 
> > on the master :-)
> >
> >   
> >> , but we do have various clients and processes querying 
> that box and 
> >> I'm sure we would have noticed spurious NXDOMAINs by now...
> >>     
> > I had to write a script using perl Net::DNS to find it because that 
> > avoids the complexity of the local resolver.
> >
> > A further question:  What operating system/file system are 
> you using?
> >   
> Solaris/UFS.
> 
> - Kevin
> 
> 
>