intermittent SERVFAIL for high visible domains such as *.google.com

Brian J. Murrell brian at interlinx.bc.ca
Thu Jan 18 15:59:58 UTC 2018


On Thu, 2018-01-18 at 15:41 +0000, Tony Finch wrote:
> 
> Does the time to recovery correspond to the lame-ttl setting?

I am not sure.  I'm not always aware of when it starts.  I guess if I
am running a trace level permanently the log would tell me though.

> The default
> is 10 minutes - try reducing it and see if the outage becomes
> shorter.

If it does, what is that telling me?  The problem domains are listing
NSes that don't actually host the zone?  I thought named normally
logged lame delegations but I don't see a single one in the last few
days.

That said, if such a high-visibility domain as googles were
misconfigured, it would be wreaking havoc all over the Internet, and
drawing lots of attention wouldn't it?

> When you have a failure, try `rndc flushtree` to more selectively
> drop
> problematic state - you might have to find out the nameservers of the
> broken domain and flush them. (The google.com nameservers are under
> google.com; GitHub's are under dynect.net and a bunch of awsdns
> domains.)

rndc flushtree takes a domain name though doesn't it?  In what case
would I need to find nameservers?

So, when I do rndc reload am I flushing the cache?  :-(

> Look at the end of the dump - the address database,

; Address database dump
...
; ns3.google.com [v4 TTL 7] [v6 TTL 7] [v4
failure] [v6 failure]
; ns2.google.com [v4 TTL 7] [v6 TTL 7] [v4
failure] [v6 failure]
; ns1.google.com [v4 TTL 7] [v6 TTL 7] [v4
failure] [v6 failure]
; ns4.google.com [v4 TTL 7] [v6 TTL 7] [v4
failure] [v6 failure]

> bad cache,

Empty.

> and
> servfail cache.

Non-existent section in my database dump.

> > Do I need tracing enabled before the situation happens?
> 
> That will make it a lot easier, yes :-)
> 
> > What level (how many "rndc trace"s should I run)?
> 
> You can specify a number directly, like `rndc trace 11` - level 11 is
> handy because it includes query and response packet dumps (er, but
> that
> is a 9.11 feature - in 9.9 you'll only get the response packets).

I'll set that trace now and hope to hit the problem again soon --
before I fill up my filesystem.  :-)

Cheers,
b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20180118/384bcd6b/attachment.bin>


More information about the bind-users mailing list