nanny (was Re: bind-9.8.1: INSIST(! dns_rdataset_isassociated(sigrdataset)) failed)

Mon Dec 5 17:11:12 UTC 2011

On Nov 18, 2011, at 4:44 AM, G.W. Haywood wrote:

> Never in several machine decades have I had to do anything like that
> for BIND.  The fact that people are even talking about it is of some
> concern to me.  Twice in approximately the last month I have had one
> particular server go down for no apparent reason.  This machine runs
> BIND.  I keep its copy of BIND fairly well up to date.  It has been
> running 24/7 for well over a decade with typically a couple of years
> between reboots.  I have no evidence that BIND was the culprit, but in
> view of recent events elsewhere it seems just a little suspicious.

Speaking as one of the original BIND 9 authors, I am happy to hear you have never needed a nanny script.  I don't think the need has significantly increased on average, but BIND is complex software, and there are bugs.

I am using "monit" for my servers -- all of the things my servers do -- because it will restart and notify me in the case of a crash.  I've triggered the BIND restart many times, usually when I kill it or tell it to stop, or I am running pre-release code on one of the two name servers I run.  BIND rarely crashes, but can crash.  The same is said for  Apache, sshd, the 10's of helper scripts I run for various web sites, database engines (although on my system those just report failure and stay down) -- even the mail server itself can crash.

That said, while it is not necessary to use a nanny script, I do find your statement sort of backwards from what I think of as best practice.  If an attacker can remotely crash a daemon while attacking it, that seems scary to me; I'd ensure it was started on some alternate port if I could at least, so I always had a back door and if it runs as root, get that crash fixed.  A crash is only one step away from a code execution exploit often enough.  BIND 9 has taken extra steps to convert this type of flaw into an intentional crash rather than a random, unexpected, and undiagnosed cause.

No software is perfect.  I'd run all services I consider critical with monit or some other advanced "keep the daemon running and report when it fails" system.  God, Monit, and there are probably others I've not tried, all keep things going so I don't have to.

BTW, I also run sshd on a second port, with a second instance of the daemon, just in case the primary fails.  Or I have to firewall it off quickly.  :)

--Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20111205/a15dad99/attachment.html>