Seemingly random ServFail issues on a caching server
gentoo at floriancrouzat.net
Wed Aug 31 13:40:50 UTC 2011
Florian CROUZAT wrote on 2011-08-25:
> Hi list,
> On a few domains (we'll consider only one domain for this example) I
> encounter sometimes (seemingly randoms) ServFails while resolving domain
> names. A client (192.168.147.2) asks my caching server (192.168.151.100)
> to resolve a target (www.leclercdrive.fr)
> Here are the relevant logs:
> Aug 24 17:14:19 ns named: 24-Aug-2011 17:14:19.377 queries: info:
> client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN
> A + Aug 24 17:14:19 ns named: 24-Aug-2011 17:14:19.380 queries:
> info: client 192.168.147.2#34502: view internal: query:
> www.leclercdrive.fr IN A + Aug 24 17:14:19 ns named: 24-Aug-2011
> 17:14:19.382 queries: info: client 192.168.147.2#34502: view internal:
> query: www.leclercdrive.fr IN A +
> A tcpdump on the local side of the NS server shows the A request and the
> instant ServFail. A tcpdump on the external side of the NS server shows
> no traffic at all in this case meaning it fails internally and doesn't
> even try to forward the A request to the Internet.
> 17:14:19.377608 IP 192.168.147.2.34502 > 192.168.151.100.53: 26340+ A?
> www.leclercdrive.fr. (37) 17:14:19.378845 IP 192.168.151.100.53 >
> 192.168.147.2.34502: 26340 ServFail 0/0/0 (37) 17:14:19.380607 IP
> 192.168.147.2.34502 > 192.168.151.100.53: 52628+ A? www.leclercdrive.fr.
> (37) 17:14:19.381383 IP 192.168.151.100.53 > 192.168.147.2.34502: 52628
> ServFail 0/0/0 (37) 17:14:19.382605 IP 192.168.147.2.34502 >
> 192.168.151.100.53: 58933+ A? www.leclercdrive.fr. (37) 17:14:19.383406
> IP 192.168.151.100.53 > 192.168.147.2.34502: 58933 ServFail 0/0/0 (37)
> A few minutes before, or later, it worked just fine, see:
> 17:15:58.736177 IP 192.168.147.2.34502 > 192.168.151.100.53: 49610+ A?
> www.leclercdrive.fr. (37) 17:15:58.784470 IP 192.168.151.100.53 >
> 192.168.147.2.34502: 49610 3/3/6 CNAME[|domain]
> The TTL of the www.leclercdrive.fr entry is 300 - which seems short to
> me - maybe the ServFail happens when a request is treated at the exact
> time of the TTL reaching zero and the cache entry beeing flushed ? I
> tried flushing the cache using rndc but the first request after that
> worked just fine (of course...)
> Any ideas/hints are welcome.
> The DNS server runs 1:9.5.1.dfsg.P3-1+lenny1
> cat /etc/debian_version => 5.0.4
> (I have no control on the version of the tools)
I found in my logfiles a few other domains where the ServFails happen, their
respective TTL are all different, from 300 sec to 86400.
I still have no idea at all how to resolve this issue and as far as I
investigated, I haven't been able to identify a pattern in those ServFails.
I'm not even sure the TTL is involved since I saw two ServFail separated in
time by less than the TTL value of the entry...
More information about the bind-users