dig ds c10r.facebook.com returns SERVFAIL

Tony Finch dot at dotat.at
Tue Sep 4 09:06:32 UTC 2018


Laurent Bigonville <bigon+bind at bigon.be> wrote:
>
> Don't take what I said about the internal working of systemd-resolved for
> granted :)
>
> Looking at the log that I initially provided
> (https://github.com/systemd/systemd/issues/8897), it seems to revalidate the
> complete chain.

Yes, you are right, I shouldn't have immediately gone for the full blast
of sarcasm without verifying that systemd-resolved deserves it. So I
looked at the log - details below. (Spoiler: my prejudices have been
confirmed.)

> An idea what should be done to fix this then?

Well, the good options are to fix Facebook (as Mark rightly said) and to
fix systemd-resolved. Alternatively you can add negative trust anchors for
broken domains like Facebook.


OK, logs. After a lot of setup faff we have:

16:24:21  Switching to system DNS server 10.200.0.200.

16:24:23  Cache miss for www.facebook.com IN A
16:24:23  Transaction 41850 for <www.facebook.com IN A> scope dns on */*.
16:24:23  Using DNS server 10.200.0.200 for transaction 41850.
16:24:23  Timeout reached on transaction 41850.

That's a remarkably hair-trigger timeout.

16:24:23  Switching to system DNS server 10.122.17.186.
16:24:23  Transaction 41850 for <www.facebook.com IN A> scope dns on */*.
16:24:23  Processing incoming packet on transaction 41850. (rcode=SUCCESS)
16:24:23  Verified we get a response at feature level UDP+EDNS0+DO from DNS server 10.122.17.186.

OK so we know at this point that systemd-resolved is not designed for fast
validation, because it hasn't sent the queries for the validation chain
yet. A big shame for new code.

16:24:23  Requesting parent SOA to validate transaction 41850 (www.facebook.com, unsigned CNAME/DNAME/DS RRset).
16:24:23  Transaction 60936 for <facebook.com IN SOA> scope dns on */*.

Wat? How does a SOA query help anything? There's no point wasting time
looking for zone cuts before you request DNSKEY and DS records, because
the DNSKEY and DS responses tell you where the zone cuts are as a side
effect. This is just a waste of time.

16:24:23  Requesting DS to validate transaction 41850 (c10r.facebook.com, unsigned SOA/NS RRset).
16:24:23  Transaction 36881 for <c10r.facebook.com IN DS> scope dns on */*.
16:24:23  Requesting DS to validate transaction 41850 (c10r.facebook.com, unsigned SOA/NS RRset).

Twice??

16:24:23  Processing incoming packet on transaction 60936. (rcode=SUCCESS)
16:24:23  Requesting DS to validate transaction 60936 (facebook.com, unsigned SOA/NS RRset).
16:24:23  Transaction 35625 for <facebook.com IN DS> scope dns on */*.
16:24:23  Processing incoming packet on transaction 35625. (rcode=SUCCESS)
16:24:23  Requesting DNSKEY to validate transaction 35625 (com, RRSIG with key tag: 36707).

Then there's a lot of upwards validation faff for com and root zones.

16:24:23  Found verdict for lookup facebook.com IN DS: insecure
16:24:23  Added NODATA cache entry for facebook.com IN DS 105s
16:24:23  Transaction 35625 for <facebook.com IN DS> on scope dns on */* now complete with <success> from network (unsigned).
16:24:23  Transaction 60936 for <facebook.com IN SOA> on scope dns on */* now complete with <success> from network (unsigned).

OK so far.

16:24:24  Timeout reached on transaction 36881.
16:24:24  Retrying transaction 36881.

At this point systemd-resolved should have abandoned transaction 36881:
facebook.com is insecure so the c10r DS is immaterial.

It then spends another 1.5 minutes (!!!) retrying 36881. If you get a
SERVFAIL from one recursive server, it's reasonable to retry on
alternative recursive servers if you have them, but it's almost always
futile to retry against the same server. systemd-resolved needs to give up
way faster.

It seems to be using SERVFAIL as a feture negotiation signal. Weirdly,
it doesn't reduce the LARGE buffer size feature on timeout (which would
make sense) but only after it gets the first SERVFAIL response (which
doesn't make sense). It also tries to make a DS query with DO=0 which is
nonsense.

16:25:52  Transaction 36881 for <c10r.facebook.com IN DS> on scope dns on */* now complete with <attempts-max-reached> from network (unsigned).
16:25:52  Auxiliary DNSSEC RR query failed with attempts-max-reached

Sheesh. At long last!

16:25:52  DNSSEC validation failed for question www.facebook.com IN A: failed-auxiliary
16:25:52  Transaction 41850 for <www.facebook.com IN A> on scope dns on */* now complete with <dnssec-failed> from network (unsigned).

WRONG. You already validated it insecure! Good grief.


Tony.
-- 
f.anthony.n.finch  <dot at dotat.at>  http://dotat.at/
Shannon: Northerly or northwesterly 3 or 4, backing westerly or southwesterly
4 or 5 in northwest. Moderate. Rain later in northwest. Good, occasionally
moderate later in northwest.


More information about the bind-users mailing list