Site Failover
Kevin Darcy
kcd at daimlerchrysler.com
Sat Jul 9 02:19:06 UTC 2005
Greg Zill wrote:
>I am just putting pencil to paper for the first time on planning a
>remote failover site for our co-lo production facility. As I just took
>over authoritative DNS and setup of a master and slave, I am wondering
>what would be the preferred configuration for another site with
>equivalent services to tide us over in the case of tornado or
>significant natural or unnatural disaster.
>
>At first I thought two more slaves back to master to keep everything up
>to date, but I do not know the impact of repeated slaving errors on
>performance once the current master falls of the face of the earth.
>
Zone transfer failures you mean? They're bad, but not terribly so. How
many zones are we talking about here? Hundreds? Thousands? Tens of
thousands? More? BIND 9 seems to do a fairly good job of controlling
this type of workload, although I've never had occasion to watch a
"disconnected slave"s behavior with more than a thousand or so zones. It
should be fairly easy to test this scenario, of course: copy your
production slave-server config to some spare box, let it transfer the
zones, then yank the network cable and watch it (on the console,
presumably :-) to see how well or how badly it deals with the cascade of
zone-transfer failures.
>Do I
>assume the manual task of switching one of the slaves to a temporary
>master in the event of failover.
>
You're going to have to do something like that anyway if you want to
change anything in your DNS data during the outage.
I'm not sure why you say "manually", though: you can automate the
slave-to-temporary-master switchover as much as you wish and your
scripting/programming skills allow.
Note that you can configure all of your slaves to all pull zones from
each other, in addition to pulling from the primary master -- this way,
you won't have to reconfigure any of them if you decide to "promote" one
of the slaves to primary master temporarily, or when you "demote" it
again. It also has the benefit of ensuring that all of your slaves will
automatically synchronize to the latest-available version of the zone,
if the primary master stays down for an extended period of time. The
downside, however, is that it will increase your serial-checking volume,
which could be a problem if you have a huge number of zones and/or small
(i.e. rabid) REFRESH settings. Or, you can trade off the serial-checking
volume against the synchronization time by choosing an inter-slave
topology of ring (e.g. slave A pulls from the primary master and slave
B, slave B pulls from the primary master and slave C and slave C pulls
from the primary master and slave A) or any other topology less
connected than any-to-any, e.g. tree, star, daisy-chain, hybrid.
- Kevin
More information about the bind-users
mailing list