No such Name, and 5second dns delay.
Barry Margolin
barmar at alum.mit.edu
Sun Feb 28 16:36:29 UTC 2010
In article <mailman.666.1267335206.21153.bind-users at lists.isc.org>,
Tory M Blue <tmblue at gmail.com> wrote:
> I've running into some issues and trying to diagnose, so maybe folks
> on here can help me with steps to troubleshoot.
>
> Bind 9.6.1-P1
> Fedora Core
>
> What I am experiencing and led to my investigation is a random 5
> second delay in name resolution. Now I know that nslookup/dig resolver
> has a default 5 second retry, if it doesn't get an answer it will try
> the second server listed in the resolv.conf.. So I sort of could
> explain the 5 second delay, didn't understand why it was happening,
> but felt I was getting closer.
>
> So then I started running some network traces (which takes some time,
> as the 5 second delay is very random}, however being patient and
> running enough "time dig host +trace" revealed a few 5 second delays,
> for the most part they are all low ms (as I expect), but a couple were
> 5 second.
>
> The delay occurs in the upper part of dig. (although interesting
> enough not one section shows more than say 175ms, ever).
>
> [tblue at w05 ~]$ time dig apps.domain.com +trace +stats
>
> ; <<>> DiG 9.3.2 <<>> apps.domain.com +trace +stats
> ;; global options: printcmd
> . 317993 IN NS C.ROOT-SERVERS.NET.
> . 317993 IN NS J.ROOT-SERVERS.NET.
> . 317993 IN NS B.ROOT-SERVERS.NET.
> . 317993 IN NS L.ROOT-SERVERS.NET.
> . 317993 IN NS D.ROOT-SERVERS.NET.
> . 317993 IN NS I.ROOT-SERVERS.NET.
> . 317993 IN NS F.ROOT-SERVERS.NET.
> . 317993 IN NS G.ROOT-SERVERS.NET.
> . 317993 IN NS M.ROOT-SERVERS.NET.
> . 317993 IN NS K.ROOT-SERVERS.NET.
> . 317993 IN NS A.ROOT-SERVERS.NET.
> . 317993 IN NS H.ROOT-SERVERS.NET.
> . 317993 IN NS E.ROOT-SERVERS.NET.
>
> <<<<PAUSES HERE>>>>>
I think it's trying to do a reverse lookup of 216.249.24.15 to display
the server name in the message below. This isn't part of the actual
resolution of apps.domain.com, just part of +stats. So it may not be
related to your original problem.
> ;; Query time: 1 msec
> ;; SERVER: 0.0.0.15#53(216.249.24.15)
> ;; WHEN: Sat Feb 27 21:25:21 2010
> ;; MSG SIZE rcvd: 500
>
> net. 172800 IN NS H.GTLD-SERVERS.net.
> net. 172800 IN NS M.GTLD-SERVERS.net.
> net. 172800 IN NS I.GTLD-SERVERS.net.
> net. 172800 IN NS F.GTLD-SERVERS.net.
> net. 172800 IN NS K.GTLD-SERVERS.net.
> net. 172800 IN NS L.GTLD-SERVERS.net.
> net. 172800 IN NS E.GTLD-SERVERS.net.
> net. 172800 IN NS J.GTLD-SERVERS.net.
> net. 172800 IN NS D.GTLD-SERVERS.net.
> net. 172800 IN NS G.GTLD-SERVERS.net.
> net. 172800 IN NS B.GTLD-SERVERS.net.
> net. 172800 IN NS A.GTLD-SERVERS.net.
> net. 172800 IN NS C.GTLD-SERVERS.net.
> ;; Query time: 14 msec
> ;; SERVER: 192.33.4.12#53(C.ROOT-SERVERS.NET)
> ;; WHEN: Sat Feb 27 21:25:21 2010
> ;; MSG SIZE rcvd: 505
>
> domain.com. 172800 IN NS ns1.domain.com.
> domain.com. 172800 IN NS ns2.domain.com.
> ;; Query time: 54 msec
> ;; SERVER: 192.55.83.30#53(M.GTLD-SERVERS.net)
> ;; WHEN: Sat Feb 27 21:25:26 2010
> ;; MSG SIZE rcvd: 104
>
> apps.domain.com. 300 IN A 216.249.24.50
> domain.com. 86400 IN NS ns2.domain.com.
> domain.com. 86400 IN NS ns1.domain.com.
> ;; Query time: 0 msec
> ;; SERVER: 0.0.0.15#53(ns1.domain.com)
> ;; WHEN: Sat Feb 27 21:25:26 2010
> ;; MSG SIZE rcvd: 120
>
>
> real 0m5.090s
> user 0m0.004s
> sys 0m0.004s
>
> So since I finally caught one of these in the wild, I could look at
> the network trace. I was caught off guard when I saw "No such Name"
> "Flags: 0x8483 (Standard query response, No such name)"
It would help if you told us WHICH query elicited this response.
>
> What? I can query my 4 servers (behind a Load balancer or through the
> LB) and the resolve fine, all are running, all have current zone files
> (they are slaves), so I don't understand the "no such name", I have
> no idea why this server is giving this response. And since it's so
> infrequent it makes no sense at all. Servers are not busy, very low
> load, gig network, no saturation, no retransmissions, everything seems
> healthy.
>
> So now I'm wondering if that's the 5 second delay, it sends out a
> request, one server sends back, no such name, so it queries my other
> set of dns servers and get's an immediate response. However all 4
> servers seem fine.
>
> So I've sniffed the traffic, I've looked at what logs I have, is there
> other logs I can enable to catch, watch for this, Is there a possible
> configuration that is out dated wrong?
>
> New to the bind list so not clear what information will allow me to
> help you, help me.
>
> I even thought that maybe I had a bad hint file named.cache file, but
> it appears to be current (well last major update seems to have been
> Dec 08).
Even if you did, one of the first things BIND does when it starts up is
query a root server to get the current root server list, and this is
used instead of the hints.
--
Barry Margolin, barmar at alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***
More information about the bind-users
mailing list