timeouts and negative caching
Gerd v. Egidy
lists at egidy.de
Thu Jun 11 13:27:09 UTC 2015
I've got a bind running as recursive resolver behind a thin internet line.
When the line is clogged, requests sometimes time out. When the dns client
retries the query, bind usually retries the request and eventually succeeds.
So far so good.
But now I sometimes see that bind does not retry immediately, but somehow
caches the error for up to 5 minutes (300 secs). The negative answer is then
given right away, without checking again if the remote server can be reached
Here is an example:
> time dig www.strato.com
; <<>> DiG 9.9.3-P2-RedHat-9.9.3-2.P2.i2n <<>> @localhost www.strato.de
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 43535
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.strato.de. IN A
;; Query time: 4397 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Jun 11 14:14:17 CEST 2015
;; MSG SIZE rcvd: 42
When I look into the bind cache I see this:
> rndc dumpdb -all
> cat cache_dump.db
strato.de. 85530 NS ns3.strato.de.
85530 NS ns4.strato.de.
85530 NS ns1.strato.de.
85530 NS ns2.strato.de.
ns1.strato.de. 85530 A 184.108.40.206
ns2.strato.de. 85530 A 220.127.116.11
ns3.strato.de. 85530 A 18.104.22.168
85530 AAAA 2a00:e10:2004::2
ns4.strato.de. 85530 A 22.214.171.124
85530 AAAA 2a01:238:e100:192::4
; Address database dump
; ns2.strato.de [v4 TTL 59] [v4 failure] [v6 unexpected]
; ns3.strato.de [v4 TTL 59] [v4 failure] [v6 unexpected]
; ns4.strato.de [v4 TTL 59] [v4 failure] [v6 unexpected]
; ns1.strato.de [v4 TTL 59] [v4 failure] [v6 unexpected]
I've seen this "[v4 TTL 59]" go up to 300.
So there must be some kind of "negative caching" which caches timeouts and,
not like the real negative caching, just active negative results.
Where do these 300 seconds come from and how can I configure them? I'd like to
drastically reduce them to something like 10 seconds or so to make sure bind
retries to resolve a query shortly after a timeout.
More information about the bind-users