RRSIG and TTL

Thu Sep 17 22:42:10 UTC 2020

I was just thinking to update this. The auth server on our end is Infoblox
with few knobs for timing (it's not awful but could definitely be better).
The caching resolver is BIND. I wasn't initially aware of the transparent
cache between. That must be the thing with the implementation bug.

It's not mine to open a case against but I plan to eventually provide my
own test results. I'll add your method to the list.

Thank you.

Scott

On Thu, Sep 17, 2020, 6:26 PM Tony Finch <dot at dotat.at> wrote:

> Scott Nicholas <scott.nicholas at scottn.us> wrote:
> >
> > Primary nameserver is behind a cache/proxy on enterprise network such
> that
> > all external traffic hits this. Zone went bogus. I blame policy but on
> > further inspection 2/3 proxys had differing TTL between the DNSKEY and
> it's
> > RRSIG.
>
> Hmm, that's suspicious. In the DNS, an RRset is an atomic unit and every
> record must have the same TTL. In DNSSEC the RRSIG is part of the RRset,
> so if there is a difference between the DNSKEY TTL and the RRSIG(DNSKEY)
> TTL there is a bug, and it might be bad enough to cause validation
> failures.
>
> It sounds like you have a good idea of what the bug might be, and my guess
> is probably the same. If we're right you will be able to provoke
> validation failures by
>
>   * query a (sacrificial!) record via the proxy with DO=0 (dig +nodnssec)
>     to populate its cache with an RRset maybe lacking RRSIGs
>     (that's the guess / bug)
>
>   * change the sacrificial record on the primary
>
>   * query again via the proxy with DO=1 (dig +dnssec) before the old TTL
> expires
>
> If our guess is right, you'll get the old record with the new RRSIG and
> validation will fail.
>
> > I suspect that the signature hit the absolute time, got a fresh copy, and
> > the DNSKEY stuck around another 2 days (1 week TTL). Now if the system
> > wasn't security aware, I'm not sure how the TTL became unmatched but I
> can
> > see that it could happen. I guess?
>
> Yes.
>
> But there's another issue that can make this bug worse: I think the 7 day
> TTL on your DNSKEY records is too long.

> BIND's default sig-signing-interval is 30 days, and signatures are
> regenerated 1/4 of the interval before expiry, i.e. 7.5 days.
>
> If you want to avoid serving bogus signatures, you need to add together
> the zone's SOA expire interval, the propagation delay between your primary
> server and your public authoritative servers, and the maximum TTL of any
> record in your zone. This sum must be less than the signature regeneration
> interval (7.5 days by default).
>
> In practice you will never get anywhere near the expiry interval unless
> things are broken, and NOTIFY means the propagation delay is negligible.
> So in the real world the important number is how good you are at
> monitoring zone propagation delays and fixing things if they become
> non-negligible. To allow for SNAFUs this is about the same as the
> traditional zone expiry time of about a week...
>
> The logistics are a bit different if you have a reverse proxy in your
> authoritative server setup, but I hope you get the idea of how to think
> about making sure your DNSSEC signatures are fresh enough.
>
> The other interesting number is the TTL. When choosing TTLs there are
> roughly two kinds of records, which I call infrastructure records and,
> uuuh, I don't have a word for the others - user records? application
> records? Anyway, infrastructure records are the irrelevant crap a resolver
> needs in order to get the answers that users actually care about, and of
> course this irrelevant crap is the tricky stuff that DNS admins have to
> work with: NS records, A and AAAA records of DNS servers, DNSKEY records,
> DS records.
>
> The TTL for infrastructure records should be relatively long, to minimize
> the amount of irrelevant crap that resolvers have to deal with, i.e. to
> reduce the tail latency experienced by end users while resolvers go off to
> look at the infrastructure. You start hitting diminishing returns for
> infrastructure TTLs after about 24 hours - delegation records in TLDs
> typically have TTLs of 24h or 48h, and that's a reasonable length for your
> in-zone infrastructure records too.
>
> Any longer than that and you are creating pain for yourself any time you
> have to do a nameserver migration or a DNSSEC rollover. With 24h TTLs
> you'll need to allow a week for a significant move; for a 7 day TTL you
> might be looking at a month of faff to deal with something that's often
> tricky and perhaps unexpectedly urgent.
>
> For other records, I find an hour is a reasonable balance between decent
> cache performance and not-too-annoying update delays. I don't have records
> with enough churn to justify shorter TTLs but your mileage may vary.
>
> (There are scientific measurements of DNS TTL vs latency that agree
> reasonably well with my suggestions, so there's a bit more to them than
> convenient round numbers!)
>
> > A low TTL would minimize it but appliance doesn't allow direct
> > configuration for DNSKEY TTL.
>
> GOOD GRIEF :-(
>
> Tony.
> --
> f.anthony.n.finch  <dot at dotat.at>  http://dotat.at/
> Biscay, Fitzroy, Sole: East or northeast 4 to 6, occasionally 7 later, but
> cyclonic 3 to 5 in south Fitzroy and south Biscay. Moderate or rough, but
> slight in southeast Biscay, becoming rough later in Sole. Thundery showers
> in
> Biscay and Fitzroy. Good, occasionally poor in Biscay and Fitzroy.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20200917/e1da32f6/attachment-0001.htm>