Trying again on SERVFAIL

Tue Feb 9 17:15:43 UTC 2021

> is there a way to know that a query has already been tried a few
> minutes ago, and failed?

From whose perspective?

A well-behaved application could remember it asked the same query
a short while ago, of course, but that's up to the application.

Or is the perspective that of a recursive resolver?  As far as I
remember, BIND used as a recursive resolver will "cache" this
knowledge, but I'm not entirely certain for how long, since it
can't use the method from an NXDOMAIN reply which includes the
SOA record (and uses the re-purposed "minimum" field for the TTL
for the negative cache entry).

> It happens seldomly, but sometimes the DKIM mail filter gets a
> SERVFAIL when it tries to authenticate an incoming message.
> SERVFAIL occurs when DNSSEC check fails.

...or when none of the name servers for the containing zone
responds with an answer.  I.e. it's not *just* DNSSEC failure
which can trigger SERVFAIL.

> Trying again is useless, it has to be treated as a permanent
> error.

Well, now...  Basically nothing in the DNS is permanent, because
it is not completely static; hence most information in the DNS
has a TTL attached to it.  So the question then becomes how an
application, say a mail server should treat SERVFAIL.  It may
very well be that the "maximum retry time" of the mail server is
far longer than any of the TTLs for the pieces of DNS data that
you could not look up, so it may be appropriate to treat SERVFAIL
as a signal to "re-queue the message and try again in 30
minutes", so in essence converting SERVFAIL into a "temporary
failure" in the context of the mail server.

SERVFAIL doesn't mean that the domain name you tried to look up
currently doesn't exist in the DNS, you just can't know one way
or the other.

> Any idea about how to tell a really temporary error?

You again have to specify the context.

Regards,

- Håvard