ZSK rollover weirdness

Lawrence K. Chen, P.Eng. lkchen at ksu.edu
Mon Sep 9 21:43:28 UTC 2013


----- Original Message -----

> On Fri, Sep 6, 2013 at 1:32 PM, Lawrence K. Chen, P.Eng. <
> lkchen at ksu.edu > wrote:

> > > So, can I just remove the Revoke line (is there an option in
> > > dnssec-settime to do this?) and have things fixed...
> > 
> 

> > guess dnssec-settime -A none -R none will remove it....but guessing
> > there's more to fixing my current mess?
> 
> Adding the revoke bit was not useful, but wasn't in and of itself
> harmful. The harmful part, and what likely was the cause of
> validation errors, was that you began exclusively signing your zone
> contents before it had been pre-published long enough for versions
> of the DNSKEY RRset without the key to expire in cache. Here's what
> I see:

> 2013-09-04 19:15 UTC
> only ZSK with id 14565 exists and is signing zone
> http://dnsviz.net/d/ksu.edu/UieG7w/dnssec/

> 2013-09-05 01:38 UTC

> new ZSK with id 44538 is signing, as is now revoked key 14565 (now
> with id 14693)
> http://dnsviz.net/d/ksu.edu/UifggA/dnssec/

> Somewhere between that roughly six-hour period, the new ZSK was
> introduced and the RRSIGs made by the new ZSK became the only useful
> ones since the old key had been marked as revoked. Now consider a
> validating resolver that retrieved the DNSKEY RRset at 2013-09-04
> 19:15 UTC. The TTL suggests it can be cached for 24 hours--that is,
> 18 hours after DNSViz first notes the presence of the new ZSK and
> RRSIGs that can only be validated by that new ZSK. This example
> validating resolver will now have issues validating names in ksu.edu
> until the cache expires 24 hours after new ZSK was introduced. Such
> is the window for failure.

> Regards,
> Casey
Yeah, there were two problems at play here...I mentioned that the activation of the new new key and revocation of the old key ended up on the same day (made worse because -A was also added, fortunately the 'd' was omitted, or it would've been a more widespread and noticeable disruption)...and that it got introduced in a quick mod in late March....with no testing. This is not the first problem I've had to fix (though my fix also broke something else, which I didn't notice because it didn't break until I deployed the script into production. Probably could've been avoided if the PHONY targets in the Makefile had been declared as .PHONY....) And, that the '-R' was subtracting what I had for '-I', with some adjustments. 

This was back when the idea was that I shouldn't be the only person that knows everything about our DNS, that was before I found myself to be the only one left. They used joke if I left, they'd have to close the University because I'm the only one knows about the obscure stuff that others dislike...like nagios, cacti, cfengine, NTP, DNS, email..... Guess they were right, we're still open now that its just me.... 

I had rather arbitrarily....set -D to +120d, subtracted 15 days to get -I of +105d....even though I knew 3 months is usually greater than 90d. But there would still be over a week for -I comes after the new ZSK. Though it did occur that 90d was bad for -R...(there were many commits to subversion as it was tweaked...) 

The second problem was that last October/November was when we started feeling the pain of DDoS attacks on our nameservers. Guess it was my fault that I had upgraded the servers to faster hardware, and gigabit NICs. At that time due to licensing for a security appliance, our 10gig pipe was capped to 2gig. Though our F5 is only capable of 1gig and two of my authoritative-only namservers are in the datacenter behind it (which wasn't too bad as until recently the datacenter is only on a 2gig link to our 10gig core.) So the maximum traffic that could hit my nameservers is 2gig...which was also the maximum for our campus.... By spring this was happening quite regularly...and starting to cause noticeable problems. They have since upgraded the license to allow up to 4gig in and out of campus... No word on whether a new F5 will happen, twice I was asked to get quotes but then meetings were cancelled....also don't know what had become of the datacenter network audit, which was reorganize vlans in the datacenter....(there's 41 vlans tagged to the F5, and probably more than that in additional vlans (though some seem kind of silly, like cluster interconnects) Current F5 can do up to 2gig, though would have to switch from fiber links to bonded copper...and not sure if the packet capture box in front of the F5 can handle that. 

So, during the summer, the IT Security group decided to block port 53 at the border, a nd then allow only known (outside facing) authoritative servers to get connections on port 53 (at least they seem to have understood that DNS is both tcp and udp....) However, they didn't know about the unknown authoritative-only nameserver....the one that our off campus second receive notifies from and are supposed to do zone transfers with. 

Was one of the first things I noticed when the comcast DNS problem was reported in the evening of September 4th....which I didn't see until I was checking email before leaving for work on the 5th. 

The firewall problem was corrected around 2013-09-05 13:19 UTC 

Making for quite the storm of problems causing this.... 

I should try to find time to sit down and conduct a more thorough review of things....or find time to find a different set of tools to manage dnssec. Or, maybe we should get a bunch of appliances instead of the next max replacement of systems. But, for a while the talk is things that don't have to be on Oracle hardware won't be on Oracle hardware anymore....to where we might cease to have any Solaris ... probably because we've had numerous failed searches for new Solaris admins (needing X+ years of Solaris 10+ experience...for senior we wanted 5+, forget what it was for non-senior)...to where I'm the only one left with >5 years of Solaris 10+ experience. 

At one time I was pushing we all go with FreeBSD....but I don't know now.... 

-- 

Who: Lawrence K. Chen, P.Eng. - W0LKC - Senior Unix Systems Administrator 
For: Enterprise Server Technologies (EST) -- & SafeZone Ally 
Snail: Computing and Telecommunications Services (CTS) 
Kansas State University, 109 East Stadium, Manhattan, KS 66506-3102 
Phone: (785) 532-4916 - Fax: (785) 532-3515 - Email: lkchen at ksu.edu 
Web: http://www-personal.ksu.edu/~lkchen - Where: 11 Hale Library 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20130909/ef319b22/attachment-0001.html>


More information about the bind-users mailing list