peculiar lookup timeouts

Wed Sep 13 23:48:30 UTC 2006

> > > Hey guys,
> > >
> > > I tried searching the list for the answer to my question, and I saw some
> > > similar questions, but I wasn't sure that it was exactly the same.
> > >
> > > So here is the scenario:
> > >
> > > Running Bind-9.3.2P1, three servers: ns1(master) ns2,ns3(slaves).  Every
> > > once in a while a customer will call in saying that they cannot resolve
> > a
> > > particular domain.
> > >
> > > So, I attempt to look up the domain via "ns1" and the lookup times out.
> > I
> > > lookup via ns2 or ns3 and it works (sometimes).
> > >
> > > As soon as I restart bind, everything works again.
> > >
> > > Also, we have an internal copy of bind running, which forwards queries
> > to
> > > "ns1" and at the time when ns1 does not get an answer (while it is
> > timing
> > > out) that system answers with the right data.
> > >
> > > If there is any more detailed information I can provide, please let me
> > know.
> > >
> > Is ns1 the master for the zones that are having the problem? Or, are you
> > calling it a "master" even though, for purposes of troubleshooting this
> > problem, it's basically just a resolver? If it's just functioning as a
> > resolver, are the domains you're having problems with Internet domains
> > or domains that are strictly internal to your environment? If Internet
> > domains, then please enumerate them so we can take a look and see if we
> > can spot any obvious problems.
> > 
> 
> Hi Kevin,
> 
> 'ns1' is a master for a number of zones, however not for the ones it is
> having problems resolving.  Other than the zones it is master for, it is
> used as a resolver.
> 
> The domains it is/was having problems with were, for example, godaddy.com.
> That was the most recent culprit.  I would receive "SERVFAIL" on 'ns1', then
> I restarted the bind daemon, and it then returned the proper information.
> 
> When I did a 'dig godaddy.com @ns1 +trace' it would get to the nameservers
> for the domain, but not be able to resolve the domain itself.  Again, after
> restart of the bind daemon, it would be able to resolve the domain.

	godaddy.com's delegation looks good as does secureserver.net's
	delegation.  I would be looking for problem with managing the
	state tables in the firewall.

	You should be able to see the problem by looking at packet
	traces.  You should see packets going out to the servers
	but not coming back.  When you restart named you get a new
	source port and you then see the reply traffic.

	In my experience, it is either a problem with the delegation
	or problems with middle boxes that stop lookups working
	99.9% of the time.  You use external delegation checkers for
	the first and packet tracers for the second.
	
> I'm not sure if this has anything to do with negative caching, or caching in
> general, but I would assume it does since a restart of the daemon fixes the
> problem.  I do limit bind to 3GB of cache as of my latest revision of
> named.conf.
> 
> Thanks for the help,
> 
> p.s. I'm not sure what the proper educate for this list is, when replying,
> should I include the sender? Or just to the list itself, since you
subscribe?
> 
> -
> Adam Young
> Systems Support Technologist
> Mountain Cablevision Ltd.
> (905)667-7436
> 
> 
> 
> 
--
ISC Training!  October 16-20, 2006, in the San Francisco Bay Area,
covering topics from DNS to DHCP.  Email training at isc.org.
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark_Andrews at isc.org