DNSSEC Bogus NXDOMAIN survives authenticating RR

Wed Dec 9 23:59:52 UTC 2009

[I finally gave up on trying to get Thunderbird *not* to wrap long
lines. Prefixing them with ">" seems to be the only way, even if confusing]

Niobos wrote:

>>> dig +dnssec removed.dnssec.dest-unreach.be
>> Even though I have added your DNSKEY as trusted key, I get SERVFAIL on
>> the first query and NXDOMAIN on the second, without BIND doing any
>> additional outgoing queries.
> This is the same behavior I'm observing.

I think I see it clearer now.

The inner workings of the NSEC/3 mechanisms are a bit of a mystery to
me, so the following is mostly based on guesswork.

Maybe I broke my test zone in a different way and that's why we don't
see the same results. Your SOA record validates, mine doesn't:

> validating @0xb91c7968: fnord.dnstest.hauke-lampe.de SOA: no valid signature found

And there lies the problem.
The signatures on your SOA and NSEC3 records in the NXDOMAIN response
are all valid. It's their meaning, the proof of nonexistence for the
removed record, that cannot be established:

> validating @0xb4e01470: removed.dnssec.dest-unreach.be A: attempting negative response validation
>   validating @0xb4e01ee0: dnssec.dest-unreach.be SOA: verify rdataset (keyid=33827): success
>   validating @0xb8e98b60: 67152CME7SOELFT0OOTFB03FQ968LOM1.dnssec.dest-unreach.be NSEC3: verify rdataset (keyid=33827): success
>   validating @0xb8e98b60: OKIU30OTQ4ETK8K4VP0L3MM20HUNI5R2.dnssec.dest-unreach.be NSEC3: verify rdataset (keyid=33827): success
> validating @0xb4e01470: removed.dnssec.dest-unreach.be A: NSEC3 proves name exists (owner) data=1
> validating @0xb4e01470: removed.dnssec.dest-unreach.be A: nonexistence proof(s) not found

BIND seems to cache the validation state of the signatures, not the
failed nonexistence proof. At least it doesn't re-validate cached answers:

> client 127.0.0.1#47401: UDP request
> client 127.0.0.1#47401: using view '_default'
> client 127.0.0.1#47401: request is not signed
> client 127.0.0.1#47401: recursion available
> client 127.0.0.1#47401: query
> client 127.0.0.1#47401: query (cache) 'removed.dnssec.dest-unreach.be/A/IN' approved
> client 127.0.0.1#47401: send
> client 127.0.0.1#47401: sendto
> client 127.0.0.1#47401: senddone
> client 127.0.0.1#47401: next
> client 127.0.0.1#47401: endrequest

So, while the first query returns SERVFAIL as expected, subsequent
responses from the cache even have the AD flag set. This is the one
thing that *really* puzzled me (otherwise I probably wouldn't have begun
looking at long debug logs ;)

> hauke at pope:~$ dig +dnssec removed.dnssec.dest-unreach.be 
[...]
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 46781
> ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 1

The response doesn't validate:

> hauke at pope:~$ dig +sigchase +trusted-key=./dnskey-dnssec.dest-unreach.be +dnssec removed.dnssec.dest-unreach.be 
[...]
> ;; Impossible to verify the Non-existence, the NSEC RRset can't be validated: FAILED

I think this is a bug in BIND's resolver part. You should forward a bug
report to bind9-bugs at isc.org.

Unbound returns SERVFAIL to all queries for
removed.dnssec.dest-unreach.be and keeps logging the failed NSEC3 test:

> unbound: [968:0] debug: Validating a nxdomain response
> unbound: [968:0] debug: nsec3: keysize 1024 bits, max iterations 150
> unbound: [968:0] info: start nsec3 nameerror proof, zone <dnssec.dest-unreach.be. TYPE0 CLASS0>
> unbound: [968:0] info: ce candidate <removed.dnssec.dest-unreach.be. TYPE0 CLASS0>
> unbound: [968:0] debug: nsec3 proveClosestEncloser: proved that qname existed, bad
> unbound: [968:0] debug: nsec3 nameerror proof: failed to prove a closest encloser
> unbound: [968:0] debug: NameError response failed nsec, nsec3 proof was sec_status_bogus
> unbound: [968:0] info: validate(nxdomain): sec_status_bogus

> Do I understand the error correctly like this: BIND failed to prove
> the domain to be insecure, hence, the NXDOMAIN response should have a
> correct signature, hence, the response it got is bogus?

Yes, domains below a trust anchor (configured manually or through DLV)
must either be signed or proven to be insecure at the delegation point.

> What did you change for the "removed" record? Did you remove only the
> A and RRSIG? Or also the corresponding NSEC3?

I removed A and RRSIG only.

Here's what I did, using 9.7 defaults and smart-signing feature:

dnssec-keygen -r /dev/urandom -3 -f ksk $zone;
dnssec-keygen -r /dev/urandom -3 $zone;
dnssec-signzone -x -S -3 - -o $zone db.test

(/dev/urandom because it's faster and this was only a test zone)

Then I edited db.test.signed, changed the "changed" record and removed
"removed" and its RRSIG.

Why we see different kinds of failures, I don't know. It's probably got
to do with some of the signey-wimey DNSSEC voodoo stuff I hope I never
have to understand in all its details.

Hauke.