Root zone timeout and workarounds?

Wed Feb 21 06:45:50 UTC 2001

At 12:36 AM 2/21/2001 -0500, you wrote:

>denon wrote:
>
> > At 11:02 PM 2/20/2001 -0500, you wrote:
> >
> > >Okay, so you're talking about other nameservers on the Internet, not stub
> > >resolvers, timing out trying to resolve names in your domain when 4 out of
> > >your 5
> > >registered nameservers are unavailable.
> >
> > Right.
> >
> > >I suppose this isn't that surprising; 4
> > >out of 5 is a serious outage. Of course, if the remote nameservers are 
> running
> > >BIND or something like it, they should quickly adapt to the outage. Have
> > >you tried
> > >to *successive* queries to those remote nameservers? Do they 
> eventually stop
> > >timing out?
> >
> > Yes, but after way too many attempts ... I almost gave up trying myself,
> > before I realized, "oh, it worked that time .. "
>
>Perhaps those remote nameservers don't have an adaptive algorithm (???) Do you
>know for sure that they are running BIND?

No clue, but I can't bank on anything. They're Internet users from all over 
the world .. using every nameserver imaginable.

> > >If this temporary effect is unacceptable, then you may be able to increase
> > >your
> > >availability by, paradoxically, reducing the number of registered
> > >nameservers. If,
> > >for example, you reduced down to 3 nameservers in 2 different locations,
> > >then if
> > >the larger location goes down -- thus making 2 of the nameservers
> > >unavailable --
> > >convergence should be faster with 1/3 of your nameservers available than
> > >with only
> > >1/5.
> >
> > Tacky, I know .. but one of the reasons we have 4 nameservers on-site, and
> > one off-site, is due to the fact that we want a majority of the requests to
> > come to our network. Namely, because there will be a lot of them, and we
> > don't want to soak the remote link's bandwidth with dns requests.
>
>If the remote nameserver is answering significantly more slowly than the 
>others,
>then other nameservers on the Net should adapt to that fact and send it less
>queries. Of course, this assumes, yet again, that those other nameservers are
>BIND or have an adaptive algorithm like BIND's.

But each nameserver would have to 'fail' first before it learns, right? 
that's pretty unacceptable, considering it'd take thousands to fail before 
things stabled out for all the users. Or are you talking the root servers?

> > >Ultimately, of course, your best availability would be achieved by having
> > >*every* registered nameserver be in a different location and/or on a 
> different
> > >network link. But that can be difficult to achieve economically and
> > >logistically.
> >
> > Exactly, I wish we could pull it off economically .. but this project just
> > doesn't merit it unfortunately.
> >
> > Thanks again for your time on this ... any ideas where I should head from
> > here? Or any better way to weight requests with the root servers, so I can
> > have less NSs listed?
>
>Not really. In a perfect world, this should all be adaptive, so that 
>wouldn't be
>necessary.
>
>You could accomplish a certain degree of "weighting" by having the NS 
>records in
>your zone be a superset of those in the parent's delegations. Nameservers 
>querying
>your domain immediately after a restart/reload, or when your domain's NS 
>records
>expire from their caches, will only know about the delegated nameservers,
>therefore the delegated nameservers would tend to get more traffic 
>(assuming all
>other things are equal, particularly, assuming that they all answer equally
>quickly). But having an NS-set mismatch like that can sometimes cause 
>glue-record
>problems, and, besides, I don't see that it would help in your situation, 
>since
>leaving the remote nameserver out of your delegations would leave your domain
>unresolvable if the network link to the other nameservers was unavailable.
>

Nod, if the internal network were to go down entirely, we'd still be dead 
in the water ..

>- Kevin
>
> > >denon wrote:
> > >
> > > > At 09:00 PM 2/19/2001 -0500, you wrote:
> > > >
> > > > >When you say the "resolvers" are timing out, do you mean caching
> > > nameservers
> > > > >doing recursive lookups, or do you mean stub resolvers?
> > > >
> > > > Excuse my lack of terminology .. but here's what's happening, 
> hopefully I'm
> > > > answering your question:
> > > >
> > > > say I have foo.com registered with NSI. I've also registered hosts 
> ns, ns2,
> > > > ns3, ns4, ns5.foo.com.
> > > >
> > > > They're listed on foo.com, at NSI, in that order. NS5 being the 
> off-site,
> > > > ns1-4 being the ones on our network.
> > > >
> > > > When I take ns1-4 down, I pick a random remote nameserver (say,
> > > > ns.yahoo.com), one that I know doesn't have it cached/etc. Then I 
> try to
> > > > resolve SomeRandomArecord.foo.com off it. These resolves are what are
> > > > timing out. It doesn't matter what remote NS I pick, I have similar 
> results
> > > > .. occasionally it'll resolve, usually it times out ..
> > > >
> > > > Am I making sense? I hope so ..
> > > >
> > > > >Perhaps you should consider putting
> > > > >the remote server second or third in the list to reduce the 
> possibility of
> > > > >timeout.
> > > >
> > > > You're probably right, I guess I was under the impression that the root
> > > > servers picked the nameservers at random (random, weighted by 
> uptime past
> > > > success, I guess).
> > > >
> > > > >In some versions of BIND 8 there was a "rotate" resolver option which
> > > > >would cause the stub resolver to rotate the nameserver list for each
> > > > >query. But
> > > > >that option appears to be gone as of BIND 9, so I wouldn't rely on it.
> > > >
> > > > Is this an issue with the root servers? Surely they're not running 
> generic
> > > > bind8 .. :)
> > > >
> > > > Thanks for your ideas Kevin. I hope I've clarified things a little.
> > > >
> > > > >denon wrote:
> > > > >
> > > > > > I've been digging through the archives, usenet as well as a 
> variety of
> > > > > > other tech docs in search of the answer for my question.  I haven't
> > > come up
> > > > > > with any results, but if this is a "frequently asked question", 
> please
> > > > > > don't be afraid to throw me to a url.
> > > > > >
> > > > > > Here's the situation we've got:  I have a situation, where I've 
> got the
> > > > > > need for a relatively highly redundant dns system (who doesn't? :).
> > > On an
> > > > > > Internet domain, as a test, I've listed 5 nameservers. One of the
> > > > > > nameservers is at a remote location, and the other 4 are at various
> > > places
> > > > > > within our internal network.  Due to the fact that the internal
> > > network is
> > > > > > all geographically in the same area, there's a "good chance" 
> all 4 here
> > > > > > would go down at the same time. We don't presently have the
> > > facilities for
> > > > > > more than one off-site, but I think it's safe to rely on just one.
> > > > > >
> > > > > > The problem is this: When I take down the 4 internal nameservers
> > > (when I
> > > > > > say take down, I mean ndc stop, not just drop the zone), the 5th
> > > nameserver
> > > > > > outside responds just fine. However, I think most resolvers are
> > > timing out
> > > > > > before it does. Shouldn't the root servers respond faster than the
> > > resolver
> > > > > > times out? While the 4 are down, if you resolve something 10 
> times in a
> > > > > > row, maybe 6 times it'll time out, and 4 times it'll resolve.
> > > (assuming you
> > > > > > resolve something different from the same zone each time .. not
> > > > > caching/etc.).
> > > > > >
> > > > > > Is this a common problem? If all 4 of the internal nameservers 
> go down,
> > > > > > will the 5th be of any use?
> > > > > >
> > > > > > I'd appreciate any insight you can give me, TIA.
> > > > > >
> > > > > > Best Regards.