FAilover w/o load balancing

Fri Mar 31 02:06:03 UTC 2000

glwillia at my-deja.com wrote:

> Hi,
> I want to set up my DNS server so that if one host goes down, the DNS
> entry points to the second system, but ONLY IF the first one goes down:
> I don't want load balancing. (I'm running BIND 8 on Linux.)
> Is there any easy way to do this? If so, could someone please explain
> it/point me in the direction of the relevant HOWTO.

(You didn't say what protocol was involved. If these are mail servers,
then the preference fields of the MX records should take care of this
automatically for you. I'll assume that they aren't mail servers.)

There are 2 basic DNS-based approaches to this problem, each with serious
drawbacks:

1) Define the name with multiple A records, set a "fixed" rrset-order on
the master and slaves.
DRAWBACKS: A) unless you can configure this on all of the slaves, and all
servers which may potentially cache the name -- if the name is an Internet
name, then forget it -- you're going to get a certain amount of
"leakage" to your backup server, since caching servers will usually
round-robin answers from cache. You can minimize the effect of the caching
servers by lowering the TTL values on the records, but only at the cost of
increasing DNS traffic, B) each client needs to be smart enough to
failover to the second IP in the list it gets from the nameserver. Not all
clients -- especially older clients -- are this smart.

2) Define the name with a single A record and then change it -- using
Dynamic Update or some other mechanism -- when that host fails. DRAWBACK:
as with option #1, caching is going to get in your way here, not to
mention the fact that the slaves may take a while to get the change, even
if they are NOTIFY-aware. Again, you can minimize the effect of caching by
lowering TTL values and putting up with the increased traffic, but unlike
option #1, where round-robin'ing caching servers will at least give out a
working address first in the list 50% of the time, even during an outage,
with option #2 when the "primary" is down, clients will get the
non-working address 100% of the time until their local caching server
times out the cache entry and fetches the changed A record. Depending on
the protocol and the client software, a 50% connection failure rate may
still allow the users to continue working -- although probably with
degraded performance -- and may therefore be preferable to a temporary
100% failure rate.

Of course, there are non-DNS-based approaches to this problem also,
usually involving router or router-like hardware or software. In this
case, you typically have an invariant IP address which is presented to the
rest of the world, and then the packets are re-routed "behind" that
IP address in case of failure. Most of these products can also do real
Dynamic Load Balancing. They're generally pretty expensive, though.

Last but not least, SRV records, which provide a "service
location" mechanism, also have "preference" and "weight" fields, that in
theory allow one to implement load-balancing and/or redundancy without all
of the caching complications. Unfortunately, the client software needs to
be SRV-aware in order for this to work, and to date there aren't any
SRV-aware clients for popular protocols like HTTP and FTP. In fact,
I think the only SRV-aware client is the Win2000 client, and that it only
uses SRV's for Active Directory-related stuff.

In my opinion, there really ought to be a record type the sole purpose of
which is for servers to communicate to each other how to order RRsets.

- Kevin