BIND HANGUP ON INTERNET OUTAGE

Fri Mar 23 19:51:18 UTC 2001

I suspect that BIND 8's single-threaded architecture may have exacerbated this
problem. Maybe BIND 9 would have fared better in this scenario, since it's
more likely that it could continue to serve up authoritative data even as it
was having trouble getting responses to its own queries.

I know it's a little too late now, but in hindsight, maybe it would have been
prudent to have had some sort of "disaster recovery" plan in place to deal
with this kind of outage. I'm thinking maybe the ability to convert all of
your servers over to an internal-root zone for the duration of the outage (?).
Even kludgier would be to temporarily NAT the internal root-nameserver
addresses to one or more internal nameservers hosting a root zone...

- Kevin

Todd, Neal wrote:

> I am the DNS hostmaster at Penn State University (PSU.EDU).  We have a
> condition which arises for our primary DNS servers when we lose our
> connection to the Internet.  We have both Internet I and Internet II
> connections.  This morning we lost both connections ( a fiber-hit
> somewhere).
>
> When that happens, the Bind nameserver (Bind 8.2.3 running on Solaris 8)
> begins to choke, apparently on unresolved queries to the outside world.  It
> will then begin to time out local PSU.EDU DNS queries as well and
> applications at University Park will begin to break.
>
> I am wondering whether this is a condition which other DNS admins have
> encountered and if so, what they have done to correct it.
>
> Neal K. Todd
> Penn State University
> Office of Telecommunications
> (814) 865-2404
> hostmaster at otc1.psu.edu