DNSSEC troubleshooting on a recursive server.

Grant Keller gkeller at corp.sonic.net
Wed Aug 7 17:19:12 UTC 2013


On 08/07/2013 01:53 AM, Phil Mayers wrote:
> On 08/07/2013 12:09 AM, Grant Keller wrote:
>> Hello,
>>
>> We have 7 recursive DNS servers running Bind 9.9.2, and we are seeing
>> some strange behavoir validating DNSSEC. We have seen this happen a few
>> times, and in the past the problem has gone away when the server is
>> rebooted, so my first guess is that some record is stuck in the cache.
>
> "Rebooted" is a bit extreme; did you actually reboot the OS, or do you
> mean "restart bind"? When the problem occurs, have you tried "rndc
> flush" to see if that corrects it?
>
> Are you using any forwarders, or might your upstream be doing
> transparent DNS caching? Unlikely, but not unheard of.
I should have been more clear, the server was rebooted for a kernel
update. Given that, I think that restarting bind would fix the problem,
I just didn't want to do that unless I have to.

>> # dig a zygo.com @127.0.0.1 +nocomments
>
> +nocomments has hidden the rcode (NODATA, SERVFAIL, etc.). So, not
> entirely helpful here.
>
> http://dnsviz.net/d/zygo.com/dnssec/
>
> ...suggests there might be an oddity with the TTL on the TXT records
> at zone apex, but not the A record. Otherwise zone looks ok.
>
> You could try:
>
> rndc dumpdb -cache
>
I ran a cache dump on both a working server and a non working one, but I
am not sure what to make of the results. On the server that is not
validating, the section of the cache looks like this:

ftp://ftp.sonic.net/pub/users/gkeller/cache_insecure.txt

The "pending answer" part strange, I don't recall seeing that before.
The "good" server has these all marked secure.
>> ; <<>> DiG 9.7.0-P2-RedHat-9.7.0-17.P2.el5_9.2 <<>> a zygo.com
>> @127.0.0.1 +nocomments
>> ;; global options: +cmd
>> ;zygo.com.            IN    A
>> ;; Query time: 162 msec
>> ;; SERVER: 127.0.0.1#53(127.0.0.1)
>> ;; WHEN: Tue Aug  6 16:06:10 2013
>> ;; MSG SIZE  rcvd: 26
>>
>> # dig rrsig zygo.com @127.0.0.1 +nocomments
>>
>
> Hmm. This *is* odd. We're on bind 9.9.3 and it seems "dig domain.com
> rrsig" always returns TTL=0.
>
> I wonder if this is new? I don't recall seeing it before.
>
> In any event, as Mark has suggested, you don't want to dig the RRSIG
> yourself. Rather, use:
>
> dig +dnssec zygo.com a
>
> ...and if you get a SERVFAIL:
>
> dig +dnssec +cd zygo.com a
dig +dnssec +cd zygo.com a resolved the domain.

I have started to get other reports of domains with the same problem.
The same nameservers are having validation issues with these, and all
the domains use pdns01.domaincontrol.com and pdns02.domaincontrol.com.
as auth name servers. I guess this points to a problem somewhere in the
trust chain, butI can't figure out where.

# dig a zygo.com  +sigchase +trusted-key=root.keys +multiline +qr

; <<>> DiG 9.7.0-P2-RedHat-9.7.0-17.P2.el5_9.2 <<>> a zygo.com +sigchase
+trusted-key=root.keys +multiline +qr
;; global options: +cmd
;; Sending:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21316
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;zygo.com.        IN A

;; NO ANSWERS: no more
We want to prove the non-existence of a type of rdata 1 or of the zone:
;; nothing in authority section : impossible to validate the
non-existence : FAILED

;; Impossible to verify the Non-existence, the NSEC RRset can't be
validated: FAILED


If I add +topdown then it succeeds.

-- 
Grant Keller
Sonic.net System Operations



More information about the bind-users mailing list