Questions After a DNS Server Crash

Martin McCormick martin at dc.cis.okstate.edu
Mon Jun 27 14:19:29 UTC 2005


	Last week, we had our primary 9.2.3 bind server crash due to
massive hardware failure, probably a disk controller, definitely not
failure of bind or FreeBSD.  Once before, when a slave failed on a
different subnet than the master, we rather uneventfully brought up
the master's second Ethernet card on the slave's address and all was
well until we were able to get a new box that didn't smell of smoke
and ran again.:-)

	This time, it was the master and things went well, but a bit
rough around the edges so to speak.  In the first place, the
replacement server normally lives on the same subnet as the cooked
master.  There appears to be an issue with FreeBSD and probably many
other UNIXen that won't let you bring up a secondary interface on the
same network with the same subnet mask.  Even worse, if one uses the
alias command as in

alias fxp0 inet 192.168.1.1 netmask 255.255.252.0  for example,

you get packets that have a 255.255.255.0 subnet mask which won't work
here.  Someone suggested using a 32-bit mask of 255.255.255.255 and
that still got a Class C mask.
	
We ended up drafting a different slave on a different network so its
secondary Ethernet card could be successfully set to equal our master
and things were beautiful once again until . . .

	I came in today and noticed that none of the remaining slaves
were doing zone transfers any more except when refresh time came
around.  I modified named.conf on each slave to reflect the primary
interface of the new master and now all those systems are happy until
we switch back to our normal configuration after obtaining a new
hardware transplant.

	The interface questions are appropriate for the FreeBSD group,
but I am describing them here as a warning to all who have great plans
for what they are going to do when this or that happens.  There are
pitfalls out there and that is one of them.

	Now, for the slave update issue.

	All our slave files pretty much look like:

zone "hardknocks.edu" {
	type slave;
	file "hardknocks.zone";
	masters {
		192.168.50.1;

	};
	notify-source 192.168.50.1;

	notify yes;
        allow-query { any; };
};

	While I can see that it is possible to have multiple masters,
is there a safe way to have multiple notify-source addresses?  What is
happening is that the box appears to send notifies on its primary
interface and the slaves are seeing them but probably think it is just
the slave notifies that occur after a slave transfers a zone.

	Thanks for any good ideas.

	One other little alligator that will bite you when you need to
promote a slave to a master is a sort of common-sense problem, but one
that at least nipped at my heals.  Use a script or some mechanical
method to make absolutely sure that all your slave zone files have
exactly the same name as they do on the master or named.conf from the
master won't know how to find them.  We have 184 zones and about 6 or 8
had slightly different names, just enough to see a bunch of sickening
messages about "file not found," etc, meaning that the customers
aren't getting service until you reconcile those names.

	Ah, for the day when we can have a massive cluster of boxes
that all run one instance of bind so that when one bites the dust, the
rest just slow down a little and only us network folks notice.

	Until the crash, bind had been running without a restart since
November 4 of 2004 and the box, itself had been up 471 days.  It
speaks pretty well for FreeBSD and bind.

Martin McCormick WB5AGZ  Stillwater, OK 
OSU Information Technology Division Network Operations Group



More information about the bind-users mailing list