Questions After a DNS Server Crash

Mon Jun 27 14:47:58 UTC 2005

At 10:19 AM 6/27/2005, Martin McCormick wrote:
>         Last week, we had our primary 9.2.3 bind server crash due to
>massive hardware failure, probably a disk controller, definitely not
>failure of bind or FreeBSD.  Once before, when a slave failed on a
>different subnet than the master, we rather uneventfully brought up
>the master's second Ethernet card on the slave's address and all was
>well until we were able to get a new box that didn't smell of smoke
>and ran again.:-)
>
>         This time, it was the master and things went well, but a bit
>rough around the edges so to speak.  In the first place, the
>replacement server normally lives on the same subnet as the cooked
>master.  There appears to be an issue with FreeBSD and probably many
>other UNIXen that won't let you bring up a secondary interface on the
>same network with the same subnet mask.

All sane operating systems and network devices I've seen do not allow 
you to do this. Which interface should it send a packet out on to 
reach a device on that subnet and why?

>   Even worse, if one uses the
>alias command as in
>
>alias fxp0 inet 192.168.1.1 netmask 255.255.252.0  for example,
>
>you get packets that have a 255.255.255.0 subnet mask which won't work
>here.  Someone suggested using a 32-bit mask of 255.255.255.255 and
>that still got a Class C mask.

Someone other than myself setup aliases on a FreeBSD box for web 
hosting purposes and seems to have used a /32 bit mask on the aliases 
when the main IP is part of a /28... Seems to work though as ifconfig 
shows /32's on all the aliases and a /28 on the main IP. Don't know 
if this is correct or not, but it seems to work. I would have put 
them on the same /28 they are inherently part of but perhaps that's 
not correct. I'm sure someone with more knowledge about aliases on 
FreeBSD will speak up, or someone should be consulted in the FreeBSD 
users group.

>We ended up drafting a different slave on a different network so its
>secondary Ethernet card could be successfully set to equal our master
>and things were beautiful once again until . . .
>
>         I came in today and noticed that none of the remaining slaves
>were doing zone transfers any more except when refresh time came
>around.  I modified named.conf on each slave to reflect the primary
>interface of the new master and now all those systems are happy until
>we switch back to our normal configuration after obtaining a new
>hardware transplant.
>
>         The interface questions are appropriate for the FreeBSD group,
>but I am describing them here as a warning to all who have great plans
>for what they are going to do when this or that happens.  There are
>pitfalls out there and that is one of them.
>
>         Now, for the slave update issue.
>
>         All our slave files pretty much look like:
>
>zone "hardknocks.edu" {
>         type slave;
>         file "hardknocks.zone";
>         masters {
>                 192.168.50.1;
>
>         };
>         notify-source 192.168.50.1;
>
>         notify yes;
>         allow-query { any; };
>};
>
>         While I can see that it is possible to have multiple masters,
>is there a safe way to have multiple notify-source addresses?  What is
>happening is that the box appears to send notifies on its primary
>interface and the slaves are seeing them but probably think it is just
>the slave notifies that occur after a slave transfers a zone.
>
>         Thanks for any good ideas.

Although I've never tried it, see if you can specify notify-source 
per zone. named-checkconf doesn't complain about it but I'm unsure if 
it will actually do anything. Worth a shot...

Also, turn off notifies on your slaves unless you need them for 
something specific.

Good luck! :)

>         One other little alligator that will bite you when you need to
>promote a slave to a master is a sort of common-sense problem, but one
>that at least nipped at my heals.  Use a script or some mechanical
>method to make absolutely sure that all your slave zone files have
>exactly the same name as they do on the master or named.conf from the
>master won't know how to find them.  We have 184 zones and about 6 or 8
>had slightly different names, just enough to see a bunch of sickening
>messages about "file not found," etc, meaning that the customers
>aren't getting service until you reconcile those names.
>
>         Ah, for the day when we can have a massive cluster of boxes
>that all run one instance of bind so that when one bites the dust, the
>rest just slow down a little and only us network folks notice.
>
>         Until the crash, bind had been running without a restart since
>November 4 of 2004 and the box, itself had been up 471 days.  It
>speaks pretty well for FreeBSD and bind.
>
>Martin McCormick WB5AGZ  Stillwater, OK
>OSU Information Technology Division Network Operations Group

Vinny Abello
Network Engineer
Server Management
vinny at tellurian.com
(973)300-9211 x 125
(973)940-6125 (Direct)
PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0  E935 5325 FBCB 0100 977A

Tellurian Networks - The Ultimate Internet Connection
http://www.tellurian.com (888)TELLURIAN

"Courage is resistance to fear, mastery of fear - not absence of 
fear" -- Mark Twain