"rndc sign", "auto-dnssec maintain" and TYPE65534 record "stickyness"?

Mon Nov 26 14:47:48 UTC 2012

All,

Up front, I should note that this was on a hidden master server which 
was running 9.7.0 (since updated). So it may not work this way on 
current versions of bind.

We (well, I) had a little accident recently when rolling a ZSK. We use 
"auto-dnssec maintain" like so:

zone "blah" {
   file "zones/blah/zone";
   auto-dnssec maintain;
   key-directory "zones/blah";
   allow-update { ... };
   type master;
};

The zones are initially signed offline using "dnssec-signzone" using the 
"-j" option so that incremental re-signing is evenly spread over time.

My normal system for doing a ZSK rollover is as follows:

cd /var/named/data/zones/blah
dnssec-keygen ...
dnssec-settime -P now -A none K<newid>

...then wait until the new DNSKEY has propagated, and:

dnssec-settime -A now K<newid> && dnssec-settime -I now K<oldid>

...then wait 30 days until bind has incrementally re-signed the entire 
zone and the old key is not in use + TTLs, then:

dnssec-settime -D now K<oldid>

Unfortunately this time, I made a mistake. After swapping the active 
keys, I foolishly ran:

rndc sign THEZONE

Somehow the notion had occurred that this would make it load the new 
keys, despite me knowing this is not the case. Instead, this immediately 
signed every record in the zone with the new key, which doubled the size 
of the zone and blew away any signing jitter I had.

[Obviously what I wanted was "rndc loadkeys"]

After recovering by reverting the active/inactive keys and running "rndc 
sign" again, two problems emerged:

  1. Despite the old key having a create/publish/active time in the 
past, and no other times, bind stopped incremental re-signing, which I 
only noticed close to the disaster point. It would sign new records 
added via DDNS, but not regenerate signatures. The daemon had been 
completely restarted, so it wasn't a stuck internal state - it must have 
been an attribute of the zone. I assume it had stopped signing because 
it was waiting on the next item...

  2. When I tried to resume the process a few days later using the same 
"new" key, the "full resign" started again, straight away, despite me 
not having made the error of doing "rndc sign".

I assume both problems are related, and were caused by the TYPE65534 
records, plural, which persisted in the zone after the original mistake:

   TYPE65534 \# 5 ( 0512870001 ) => old key id# 4743
   TYPE65534 \# 5 ( 05B98E0000 ) => new key id# 47502

My question is this: how could I have avoided the two problems that 
occurred the *second* time I tried this?

Presumably I needed to make the zone completely forget about the new key 
ID, which would have removed the relevant TYPE65534 record - but would 
that have re-started the incremental re-signing with the old key?

Cheers,
Phil