DNS security, amplification attacks and recursion

Tue Jul 7 20:31:04 UTC 2020

Brett Delmage <Brett at BrettDelmage.ca> wrote:
> On Tue, 7 Jul 2020, Tony Finch wrote:
> >
> > 	minimal-any yes;
>
> Why only reduce and not eliminate?

The reason is a bit subtle. If an ANY query comes via a recursive
resolver, it is much better to give the resolver an answer so that it will
put an entry in its cache. The cache entry will stop more ANY queries from
being sent from the resolver to the upstream auth server, as long as its
TTL lasts.

If the auth server does not answer, or sends a REFUSED error, the resolver
is likely to retry, which increases worthless traffic rather than
suppressing it, and the resolver may decide the auth server is lame which
will cause knock-on problems for legitimate queries.

There are some scenarios where reflection attacks go through multiple
servers. If you can get cache entries into those servers then the
attack traffic gets suppressed closer to its source. There have been quite
a lot of attacks that work like this:

  * an ISP has a huge number of customers with crappy home routers, that
    can act as open recursive resolvers

  * an arsehole decides to use these crappy home routers in a reflection /
    amplification DDoS attack

  * the crappy home routers forward the attack queries to their ISP's
    recursive servers; these recursive servers are legitimate and well
    configured but suffer from bad client devices

  * the recursive servers resolve the queries against some third party
    authoritative servers

If the recursive servers cache the responses, then the auth servers should
not be much affected by the attack: most of the traffic is answered from the
ISP caches, and maybe the home router caches if they have them.

But if the auth servers don't answer, or send REFUSED errors, then the
recursive servers are going to keep retrying queries, and thereby relay a
very large proportion of the attack traffic to the auth servers. Sadness
will follow.

Note that RRL does not help in this scenario, because from the auth
server's point of view the ISP resolvers are legitimate clients, which RRL
can observe from their retry behaviour. RRL is designed for attacks where
the spoofed queries go direct to the auth server, which is not happening
in this case.

When this happened to us (when my servers were the third party auth
servers) the DDoS attack was hitting a very large number of ISPs, so our
auth servers were getting ANY queries via huge numbers of recursive
servers. Extra unfortunately, the ANY response was too big to fit in UDP,
so all the resolvers were trying to query over TCP. And our auth servers
did not have enough TCP capacity to handle the load. Much sadness. (It
didn't take us offline because our off-site auth servers were differently
configured and able to keep answering.)

So I implemented minimal-any to stop it from happening again.

Tony.
-- 
f.anthony.n.finch  <dot at dotat.at>  http://dotat.at/
Fisher, German Bight: Westerly veering northwesterly 4 to 6, decreasing 3
later in south German Bight. Moderate, occasionally rough at first. Mainly
fair. Mainly good.