kcd at daimlerchrysler.com
Sat Nov 20 01:17:02 UTC 2004
>im thinking about a failover setup of webservices at different locations via
>I got some questions about the possibilities of this:
>1. how is "IN NS" cached and used by other bind nameservers if one of the NS
>is down? f.e. the TLD server has two "IN NS" records for my zone, now a
>nameserver is looking up this zone and will get this 2 records. first i
>think its trying to resolv via the first nameserver of the replyorder, but
>what would be if this one is down and not reachable, will the resolving
>nameserver try to query via the second one a second time? what would be if
>the first nameserver can succesfully answer, then will be cached by the
>resolving nameserver, but then in the future of the life of the cached "IN
>NS" record the nameserver will be down, is the second nameserver still in
>the cache and the failover will work if this will happen?
Nameservers will generally keep track of how fast other nameservers
respond to queries and prefer faster nameservers over slower ones. If a
nameserver stops responding, it'll get heavily penalized as being "slow"
but eventually used again in case it has recovered. Overall this
mechanism makes nameserver-to-nameserver traffic rather adaptive and
>2. is the only solution to get a global dns failover without the use of
>routing protocols like BGP to use two or more nameservers at different
>locations(AS or something else) which will then answer queries f.e. of
>webservers with its own specific A-records? f.e. if nameserver A is down in
>cause of a routing problem, then a resolver will query nameserver B(located
>at a different provider) which then will answer a query for www.domain.tld
>with a specific A-record which will be reachable, because its in the same
That'll give you very basic, crude failover capabilities, but it won't
give you actual load-balancing (since the speed at which the nameserver
responds may have nothing to do with the load on the webservers or
whatever other servers you're trying to load-balance). In fact, if one
nameserver happens to be significantly closer to the Internet backbone
and/or the source of the majority of your potential clients, you may
find that this approach causes severe load skewing.
Another drawback of this approach, of course, is that you have to
maintain two different versions of A records you want to be redundant.
>3. if the "IN NS" failover is possible, whats about caching nameservers
>which are caching A-records? are them also failover possible, if yes would
>it be possible to return the A-records for the webserver of both locations
>so that a client will try webserver A first and when not reachable webserver
>B (i think its a implementation thing and too much risk)? or is the only
>solution to create a zone with a TTL of zero?
Any DNS-based-solution is going to require low TTL values, since
otherwise it won't be very dynamic. Lowering your TTL values like that
is rather anti-social since it not only makes your nameservers work
harder handling more queries, but it makes everyone *else*'s nameservers
work harder querying your nameservers. As for the approach of using
multiple A records, you can do that, but you'll get a certain amount of
randomness depending on what your TTL values are set to, since most
resolvers will "round-robin" their answers when replying from cache.
Also, be aware that some clients (in particular, some web browsers),
take a long time to do address failover. Longer than may be acceptable
to your customers.
>thanks for any hints and explanations to get this fully understanding :)
The *right* way to use DNS for resource failover is to use SRV records.
Unfortunately, very few client software authors have adopted SRV
support. So in the meantime, most folks are implementing dedicated
and/or hardware-based load-balancing solutions instead, which give
load-balancing benefits as well as just failover. The better ones can be
pretty pricey though.
As you mention above, it's also possible to play stupid BGP tricks and
the like to squeeze out some failover and/or load-balancing
functionality. I don't have any direct experience with that, but I am
led to believe that there are serious drawbacks to that approach in
terms of reliability and convergence time.
More information about the bind-users