All Bind servers crashed

Wed Nov 16 13:59:10 UTC 2011

On Wed, 16 Nov 2011, Bill Owens wrote:

> On Wed, Nov 16, 2011 at 09:57:18AM +0100, Stephane Bortzmeyer wrote:
>> On Wed, Nov 16, 2011 at 09:47:48AM +0100,
>>  Magnus Schmidt <ms at bisping.de> wrote
>>  a message of 49 lines which said:
>>
>>> Nov 16 05:30:41 xxx named[1326]: critical: query.c:1781: INSIST(!
>>> dns_rdataset_isassociated(sigrdataset)) failed, back trace
>
> This behavior makes me bet that the trigger is a name in an incoming 
> email message, being resolved by an anti-spam filter. That appeared to 
> trigger a site-wide resolver crash back in May, when the oversigned .gov 
> zone was mentioned on a list (this particular list, I think). That 
> suggests looking in the inbound mail spool to see what might have been 
> received at the time of the crash might be productive.
>
> Regardless of how the query was started, if this theory of propagation 
> is correct I'd suggest that posting the triggering name unobscured in an 
> email message would be A Bad Thing, even if one is emailing it to ISC as 
> they've suggested. Perhaps *especially* in that case, unless they've 
> taken care to have one production recursor running Unbound ;)
>
> Bill (who is downloading Unbound right now)

We had the same thing happen, across multiple, geographically-diverse 
servers overnight, around the exact same time as the OP.  That seems a 
little odd to be an email, as it would have to cover a myriad of 
destinations all at once.

While that's possible, I'm just finding it lacking as the sole reason for 
the conclusion.

Using 9.7.3-P3 from ISC sources, here, too.