Multi-master (HA)

Wed May 7 22:06:56 UTC 2014

On 05/06/14 13:39, Evan Hunt wrote:
> On Tue, May 06, 2014 at 06:20:11PM +0000, Baird, Josh wrote:
>> Hi,
>>
>> For those of you who operate at multiple sites or datacenters, are you
>> doing any HA for your BIND masters?  Ideally, we would have a master in
>> each datacenter; maybe not an active one, but one that is standing by in
>> case your primary master becomes unavailable.  
>>
>> Do you have multiple "active" masters and list them as master in each of
>> your slave's zone definitions?  This seems like it could get rather
>> messy.  One thought is to use a technology like VMWare SRM which will
>> spin up a master/virtual machine automatically in a second datacenter if
>> your primary master goes down.  This coupled with Layer2 connectivity
>> between your sites could make things fairly simple.  The
>> standby/secondary master would retain the same IP address as your
>> primary, so everything should just *work*.  
>>
>> What are others doing?  Any thoughts, ideas or advice is much
>> appreciated.
> 
> Thank you for bringing this up.  As it happens, high-availability/
> multi-master support in BIND is something we've been seriously considering
> for a future release.  There's been a lot of internal discussion of use
> cases, requirements, and possible design approaches.
> 
> I don't want to influence the conversation here by saying too much about
> the ideas we've had so far, but I wanted to say: if anyone has specific
> thoughts on how to make this sort of thing easier in BIND -- even just at
> the level of "boy, it irritates me that I can't make BIND do <X>" --
> such comments will fall on welcoming ears.
> 

I hadn't thought of doing multi-master...but the issue of promoting a slave to
master for DR had come up.  At the time the problem was DNSSEC.  Its one thing
for the slave to become master, its another when it needs to change entries in
the zone file to redirect key web-services to DR instances. (at the time, it
was create two signed zone files each time...and secure transfer the second
one out of band....but no DR web servers were ever setup, so both were
identical files and eventually got scrapped. The issue of raw vs text on
secondaries came up after abandonment.  But, DR comes up now and
then...recently its using DNS appliances and cloud...

OTOH, the idea of multi-master is intriguing.....the only down side I see, is
that I have one really powerful server for my current master....(Sun Fire
X4170)....and my other servers are weak leftovers....just passed EOL last
year.  And, have all the servers doing full DNSSEC signing could be interesting.

It also raises the question of how does the outside world cope with all the
servers having identical zones...signed on slightly different times, etc.
(especially since I'm using unix timestamp for zone serial....avoids issues of
multiple admins incrementing serial without noticing others and/or collisions
with DNSSEC's incrementing of serials.)

But, it shouldn't be too hard to implement since, our nameservers are managed
by CFEngine.  And, it makes possible for all my name servers to have both
internal and external views.  Instead of having to have separate external
slaves and internal slaves.  (and other issues that I'm still working through
with having this....namely my recursive caching servers hitting external
slaves instead of internal slaves...)

Things have gotten more complicated since we started allowing vanity internal
names....before it was one subdomain that only existed on internal, and
everybody had to put their host in there, as <dept>-host.<subdomain>.ksu.edu
....but then certain VIPs wanted host.<dept>.ksu.edu to work even though its a
10.x.x.x address.

It would also mean one of our satellite campuses that refuses to use our
caching servers (and even sent our server that was providing the service for
their campus back, which they had firewalled their users from using while it
was there)...can have their own caching servers work without needing to
understand that our whois record doesn't list our stealth/internal
nameservers...which is why they can't resolve any internal services and need
to track down somebody to give them the 10.x.x.x IP and having their users use
that, etc.

Wonder if they know about the change in forwarding on my caching resolvers to AD?

-- 
Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
For: Enterprise Server Technologies (EST) -- & SafeZone Ally