Non-responsive name servers when started during boot on OS X Mavericks 10.9

Larry Stone lstone19 at stonejongleux.com
Tue Jan 21 00:32:24 UTC 2014


On Jan 20, 2014, at 1:22 PM, Chris Buxton <clists at buxtonfamily.us> wrote:

>> Problem: This morning, by happenstance, both were rebooted a few minutes apart and suddenly, nobody could access anything. Finally figured out that named on both was not responding (queries timed out). Killed named (which was immediately restarted by Apple’s launchd) and all was well. Rebooted the secondary to see if it was repeatable and same thing. Nothing of interest in the log - both the initial startup at boot time and restart log identically (and it does log the RFC 1918 empty zones warning so it gets that far). I’m guessing there’s some resource not available at boot time that’s causing named to hang but that really just a will guess.
> 
> I remember fixing this problem way back when Apple first switched to launchd (10.4 or so). Basically, Apple patches (or used to patch) named to make it register with the system to be told when a network interface is added. Their patch allowed named to start up before the network is up, and then essentially get a SIGHUP or something like it every time a network interface comes up or goes down.
> 
> The problem is that launchd starts named before the network is up. The solution is to have it wait a few seconds before starting. The way we did it back then was to have launchd start a script instead of starting named directly. The script would simply sleep 3 seconds (or something like that) before starting named. It would then stay open.

Thanks Chris. As I mentioned in a follow-up, I did reach that conclusion after finding it was responsive on 127.0.0.1 but not on the machine’s external address. And I have worked around it in exactly the way you mention except I have the sleep at 30 seconds (I tried 15 and it was too short - but that machine is slow; OTOH, I tested on my new MBP with an SSD system disk and it boots so fast that named seems to come up OK. For my needs, the script delay as a work-around is “good enough”.

> I’d bet that the package from Men & Mice includes this script or an equivalent workaround. When I wrote the original script I wrote about above, I worked at Men & Mice.

The problem I have with it is there’s no documentation I can find. If they have patched it, I’d like to know about. 

One reason I’ve moved away from Apple provided versions (besides them suddenly removing it) and am now going with all “built from source” for my server software is Apple’s tendency to make undocumented changes to open source software. It’s been a problem in the support environments of some other software I use (not that this issue is unique to Apple).

I used a package inspector to look at the Men & Mice package and there’s no launchd plist in there so it’s not clear to me how they get it started. But inspecting packages is new to me so there may be other things I’m not seeing.

In any event, as I said, I have a “good enough” solution for my needs so anything further on this will be mostly of intellectual interest.

-- 
Larry Stone
lstone19 at stonejongleux.com
http://www.stonejongleux.com/



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4160 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20140120/ee6c3c5c/attachment.bin>


More information about the bind-users mailing list