Timeout and SERVFAIL

Wed May 30 01:55:52 UTC 2018

On Tuesday, May 29, 2018 16:53:02 Alex wrote:
> ...
> 
> Last week the network with the master and one of the slaves went down
> for an extended period. Requests appeared to still be served by the
> second slave on the totally different network.
> 
> At least for a while. It appeared once the negative cache expired
> after 24h, requests to the domain just resulted in SERVFAIL.
> 
> @  IN    SOA   ns.example.com. admin.ns.example.com. (
>                 2018041703      ;serial (yyyymmddxx)
>                 3h              ;refresh every 3 hr
>                 1h              ;retry every 1 hr
>                 7d              ;expire in 7 days
>                 1d )            ;negative cache minimum ttl 1 day
> 
> How can I configure the name servers so failure of one or two doesn't
> impact the third?
> 
Unless it is also serving recursive queries, caching is not a factor on an authoritative server. What expired was not the negative cache interval; it was the zone expiration interval. To avoid the possibility of returning incorrect information, a secondary server stops serving a zone when the zone expiration period passes without contact with its master(s). This is by design.

To remedy this, you must ensure that the above condition does not occur. You must either get your master(s) back online faster, or increase the zone expiration period in your SOAs, or both.

> In the time leading up to the cache expiring, were other requests
> being rejected due to the two nameservers for that zone being
> unreachable?
>
No. You should find the zone expiration event in your logs.

-- 
Greg Rivers