Local Lookups Fail When the Net is down.

Lawrence K. Chen, P.Eng. lkchen at ksu.edu
Sun Dec 23 05:22:14 UTC 2012

----- Original Message -----
> In message <201212202013.qBKKDKsi002407 at x.it.okstate.edu>, Martin
> McCormick writes:
> > 	We are using BIND 9.7.7 with recursion. Our boarder
> > router temporarily failed completely isolating our campus from
> > the rest of the internet.
> > 
> > 	During that time, it was impossible to do local lookups.
> > We were showing 997 out of 1000 recursive clients which is no
> > surprise but the loss of local resolution effected our telephone
> > system which is migrating over to VOIP + any other lookups a
> > client might do that at least in theory should still work
> > because they are making queries for hosts in our master zones.
> > 
> > 	I have been here for a bit over 20 years and we have
> > lost all connectivity only a very few times, but I had actually
> > begun to think that newer versions of bind would still provide
> > local resolution. The systems running the master and slave DNS's
> > continued to run as they have plenty of resources, but there was
> > no local resolution.
> > 
> > 	Is there anything short of internal and external-facing
> > DNS's that we can do to be sure that local resolution stays up?
> You need to look at search lists and make sure there are no external
> dependancies.
> If you have partially qualified names being used you may be depending
> apon a NXDOMAIN from the root.  A local copy of the root zone will
> help here.
> If you do recursion internally you will need to increase the number
> of recursive clients.
> If you are validating you will want to distribute trust anchors
> for internal namespace.
> If you are using DLV you will want a internal copy of the dlv zone.
> > Thank you very much.
> > 
> > Martin McCormick Stillwater, OK
> > Systems Engineer
> > OSU Information Technology Department Telecommunications Services
> > Group
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org
> _______________________________________________

I've been puzzling over this very problem was well.  I had thought it was because our recursive caching servers are also authoritative for some internal domains.  The past DNS administrator said it was to speed up local resolution on the recursive caching servers, though since our last Internet outage (actually due to the log volume filling up on IT securities network policy enforcement appliance, which won't pass any traffic if it can't log.)

So, when our Internet connection went away, local resolutions was impacted.  Including the domains that the caching servers are authoritative for.  When I had asked other around online about why we had this problem, it was strongly suggested that I move the authoritative stuff off of our caching servers.  Which is something I've tried to do at various times...in the 5+ years since I've become the DNS guy.  Intend to make a stronger effort in the new year, since I realized that its probably bad that I don't have any internal secondary authoritative nameservers and we've been running a split DNS for a few years now.

I know the recursive client queue got full, though I don't remember seeing it completely full.  Our servers are set to allow 10,000 recursive clients.

When I started, our BIND versions were a mix of 9.4 & 9.5, I later unified everything to 9.6...and since then all the caching and authoritative only (on campus, not sure what our off-campus secondaries are running) have stayed close in versions (security patches might have the caching servers a bit newer at times.) tracking up through the various 9.6 versions and the various 9.7, and a recent upgrade from 9.7.6-P4 to 9.9.2-P1 on December 7th.

OTOH, now I'm intrigued by the response...so its DNSSEC that is causing the problems...how do I do DNSSEC validation of local domains when the parents (.edu or DLV) are out of contact?  How would I go about creating my own dlv zone?

I guess now that I've upgraded....from 9.7 to 9.9....the way the recursive client request queue fills up during an outage will be slower? less likely?  Since 9.8 changed the default resolver query timeout from 30 seconds to 10 seconds, right?

Though I suppose the intent of separating authoritative and caching...is that the caching servers will cache responses for local domains as well during an outage.  And, now that zone data changes are done from a different server, then applied to the primary nameserver, along with various other operations behind the scenes (including flushing zones from certain caching servers) the issue of instantaneous updates is less of a problem.  Hopefully nobody will trigger the process during an Internet outage :)

So, I'm going to work on getting some internal secondaries in the new year...and this private overlapping DLV is something I'll have to research.

Along with other things I'm wanting to do....wonder when the slides from the DNSSEC presentation at LISA are going to be made available....oh look, they're out now...

Who: Lawrence K. Chen, P.Eng. - W0LKC - Senior Unix Systems Administrator
For: Enterprise Server Technologies (EST) -- & SafeZone Ally
Snail: Computing and Telecommunications Services (CTS)
Kansas State University, 109 East Stadium, Manhattan, KS 66506-3102
Phone: (785) 532-4916 - Fax: (785) 532-3515 - Email: lkchen at ksu.edu
Web: http://www-personal.ksu.edu/~lkchen - Where: 11 Hale Library

More information about the bind-users mailing list