DNS Redundancy

Thu Oct 21 15:26:22 UTC 2010

We have been very successful using any-casting whereby multiple,
equivalently-configured DNS servers are placed throughout the network,
all providing DNS service on the same virtual addresses, and these
virtual addresses are host-routed (i.e. route with slash-32 netmask).

The keys to this working well are:
  1. Host routes are dynamically asserted or withdrawn based on health
     of the DNS service on each server.
  2. Packet flow paths are stable across the network (for tcp based
     queries).
  3. Publish two any-cast resolver addresses.

I have seen people run dynamic routing protocols on the servers (e.g.
ripv2 or ospf) combined with cron-driven health check scripts that
control the dynamic routing of the virtual address.  We have also used
load balancers to handle the server health monitoring and the dynamic
routing -- only because the load balancers happened to be convenient
 -- I would not use a load balancer otherwise.  But I prefer the Cisco
IP SLA idea to both monitor the server health and control the host
routes (although I have not tested this).

The stable path requirement is easy with Cisco CEF as long as you do
not use per-packet load sharing.

It is actually counter-productive to have two resolvers configured
with this architecture, but to circumvent human nature, we publish two.

There is absolutely no functional difference between the two, and
there is no redundancy value for the second one -- they are both
hosted on each and every one of the any-cast servers.  The only
reason for the the second resolver is to deter people from making
up their own second resolver -- people expect two resolvers, and
if you give them only one, they will go ahead and put something in
as the second resolver -- even if you tell them not to.  This is a
very important aspect of having the architecture succeed in our
environment.

--
Gordon A. Lang

----- Original Message ----- 
From: "Martin McCormick" <martin at dc.cis.okstate.edu>
To: <bind-users at isc.org>
Sent: Thursday, October 21, 2010 7:32 AM
Subject: DNS Redundancy

> The normal procedure on internet-connected systems is to
> set the resolv.conf file to include at least 2 domain name
> servers. Example:
>
> nameserver 139.78.100.1
> nameserver 139.78.200.1
>
> Last night, I had to take down our primary DNS for
> maintenance and lots of FreeBSD and Linux systems began having trouble of 
> various
> kinds.
>
> While I expected the FreeBSD system I was on to hang for
> a couple of seconds and then start using the second DNS, it
> basically froze while some Linux boxes also began exhibiting
> similar behavior.
>
> I finally manually changed the resolv.conf on the system
> I was using to force the slave DNS to be first in the list and
> that helped, but loosing the primary DNS was not the slight
> slowdown one might expect. It was a full-blown outage.
>
> Are we missing some other configuration directive for Unix systems
> that would make the systems use the redundancy a little
> more gracefully than what happened? Otherwise, why have it if
> somebody has to manually intervene? The only thing we should
> have lost was dynamic updates. The outage lasted for 25 minutes
> or so but didn't resolve until the primary came back on line.
>
> This is my week for asking novice questions, but I don't
> get to see what happens when the master goes away all that often
> and what I saw wasn't pretty.
>
> Martin McCormick WB5AGZ  Stillwater, OK
> Systems Engineer
> OSU Information Technology Department Telecommunications Services Group
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>