"*.dlv.isc.org DS: must be secure" warnings [was: Re: 9.6.1-P1 log message]

Mon Sep 28 00:16:51 UTC 2009

In message <Prayer.1.3.2.0909262248400.24454 at hermes-1.csi.cam.ac.uk>, Chris Tho
mpson writes:
> Back in August there was some a thread on bind-users about messages
> of the shape
> 
>   validating @[hex]: [name].dlv.isc.org DS: must be secure failure
> 
> (these are category "dnssec" severity "warning") and on 31 August I wrote:
> 
> >We have been running two production recursive nameservers validating against
> >dlv.isc.org since 9 June, and first saw a batch of messages (for both server
> s)
> >like this on 20 July. We reported them to ISC and got suggestions along the
> >lines of Mark's above, along with an admission that current versions of BIND
> >give up on EDNS too easily in situations they maybe shouldn't, which may be
> >fixed in future releases.
> >
> >Since then we have had a trickle of such warning messages in the logs. We
> >assume that they are the result of temporary network glitches somewhere,
> >but their frequency appears to be increasing, which is somewhat worrying.
> >It's also not clear whether any client queries are actually failing as a
> >result, or whether BIND is simply trying another dlv.isc.org nameserver
> >with better luck.
> 
> I have been looking at this again, and in fact there was a step function
> on 21 August when the messages rose from almost nil to 15-20 per day, and
> then fell back to almost nil after 15 September (we've seen just one since
> then). We have been running BIND 9.6.1-P1 throughout.
> 
> I would be very interested to know whether other recursive nameserver
> operators validating via dlv.isc.org have seen a similar pattern. I am
> prepared to believe that the frequency is related to transient network
> errors or delays, but I have no idea whether they are likely to be local
> or at at the dlv.isc.org server end.

One gets these or similar messages when named falls back to plain
DNS as a result of multiple timeouts.  Named tries EDNS advertising
a 4096 byte UDP buffer, then after multiple timeouts it tries EDNS
advertising a 512 byte UDP buffer, then after multiple timeouts it
tries plain DNS.

Named also had a bug where it would fallback a EDNS step when it
didn't need to (like retrying w/ TCP).  This made DNSSEC behind
middleware that was dropping fragments difficult.

2564.   [bug]           Only take EDNS fallback steps when processing timeouts.
                        [RT #19405]

Some (perhaps not all) of the timeout causes are below.  This list is
not specific to DLV.

(apparent) non responses to UDP queries can be due to lots of causes:
*+ Firewalls/middleware that blocks DNS responses > 512
*+ Firewalls/middleware that blocks fragments
*+ Lack of support for out of order responses in NAT
*+ Responses that require fragmentation but DF set.  Most of these will
  be in the 1481-1500 bytes in size (IP in IP tunnels).  Larger responses
  are usually fragmented by the sending OS and don't have DF set.  Smaller
  response make it through a single layer of encapsulation.
*+# Bad nameserver software that fails to respond to EDNS requests
*+# Firewalls/proxies that block EDNS queries or queries/responses with
  one or more of DO, CD or AD set.
* Congestion
* Packet corruption
* Appear lost due to long rtt times
  - load balancing probes taking too long
  - multiple satellite links
  - significant congestion causing long delays

+ indicates broken software
# indicates fallback to plain DNS will be required

A handful a day would suggest packet corruption/congestion as the likely
cause.

Mark
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org