Two site failover / load distribution
kcd at daimlerchrysler.com
Tue Mar 22 00:24:51 UTC 2005
>I'm trying to put together a low cost HA solution based around two sites
>each with xDSL connections. At each site I have a web server & DNS server
>running Bind 9. My goal is to provide a solution that distributes users
>across the two sites and as seamlessly as possible, copy with either site
>I had originally hoped to achieve this using Round Robin, although searches
>on Usenet indicate this will satisfy the load distribution requirement, but
>not the failure requirement.
It _technically_ meets the failure requirement, but some browsers take
so ridiculously long to do address failover that in practical terms
there is no failover. The browser user gives up before the failover
>An alternate approach would be to make the DNS
>servers at both sites masters and hold A records only for the relevant site,
>combined with a low TTL (e.g. the A record on the DNS server at site one
>points only at the web server at site one; similar for site two). This
>addresses failures, but not load distribution.
In this model, load distribution will occur as a rough function of how
quickly the respective *nameservers* respond (or whether they respond
at all, hence the implicit failover capability in case of total site
failure). But this probably has little or no bearing on how quickly the
*webservers* or other application-level components respond, so you may
find that even under normal situations, your traffic is heavily skewed
to one site or the other.
Also, you can have a situation where the nameserver at one of the sites
is up and running fine, there is network connectivity to the site, but
the webserver or some other component(s) at the site is down. This
dual-master model can be refined to have an automatic process which
monitors the infrastructure and changes the relevant A record --
possibly using the Dynamic Update protocol -- if one site or another
becomes non-functional. Of course, at that point one is starting to
re-invent commercial load-balancing technology...
>Having searched further it sounds as though something like lbnamed may be
>the solution, but I wondered what experiences others had on the NG?
Never used lbnamed. We use commercially-available load-balancing
devices. However, even with those we end up having to reduce our TTLs to
anti-social levels in order to get the load-balancing and/or failover
granularity we require. A-record-based load-balancing/failover is always
going to be quite imperfect. SRV-record-based load-balancing/failover
shows more promise, but client-software (e.g. browser)
developers/providers are taking a long time to adopt it.
More information about the bind-users