No such Name, and 5second dns delay.

Barry Margolin barmar at alum.mit.edu
Sun Feb 28 16:36:29 UTC 2010


In article <mailman.666.1267335206.21153.bind-users at lists.isc.org>,
 Tory M Blue <tmblue at gmail.com> wrote:

> I've running into some issues and trying to diagnose, so maybe folks
> on here can help me with steps to troubleshoot.
> 
> Bind 9.6.1-P1
> Fedora Core
> 
> What I am experiencing and led to my investigation is a random 5
> second delay in name resolution. Now I know that nslookup/dig resolver
> has a default 5 second retry, if it doesn't get an answer it will try
> the second server listed in the resolv.conf.. So I sort of could
> explain the 5 second delay, didn't understand why it was happening,
> but felt I was getting closer.
> 
> So then I started running some network traces (which takes some time,
> as the 5 second delay is very random}, however being patient and
> running enough "time dig host +trace" revealed a few 5 second delays,
> for the most part they are all low ms (as I expect), but a couple were
> 5 second.
> 
> The delay occurs in the upper part of dig. (although interesting
> enough not one section shows more than say 175ms, ever).
> 
> [tblue at w05 ~]$ time dig apps.domain.com +trace +stats
> 
> ; <<>> DiG 9.3.2 <<>> apps.domain.com +trace +stats
> ;; global options:  printcmd
> .                       317993  IN      NS      C.ROOT-SERVERS.NET.
> .                       317993  IN      NS      J.ROOT-SERVERS.NET.
> .                       317993  IN      NS      B.ROOT-SERVERS.NET.
> .                       317993  IN      NS      L.ROOT-SERVERS.NET.
> .                       317993  IN      NS      D.ROOT-SERVERS.NET.
> .                       317993  IN      NS      I.ROOT-SERVERS.NET.
> .                       317993  IN      NS      F.ROOT-SERVERS.NET.
> .                       317993  IN      NS      G.ROOT-SERVERS.NET.
> .                       317993  IN      NS      M.ROOT-SERVERS.NET.
> .                       317993  IN      NS      K.ROOT-SERVERS.NET.
> .                       317993  IN      NS      A.ROOT-SERVERS.NET.
> .                       317993  IN      NS      H.ROOT-SERVERS.NET.
> .                       317993  IN      NS      E.ROOT-SERVERS.NET.
> 
> <<<<PAUSES HERE>>>>>

I think it's trying to do a reverse lookup of 216.249.24.15 to display 
the server name in the message below.  This isn't part of the actual 
resolution of apps.domain.com, just part of +stats.  So it may not be 
related to your original problem.

> ;; Query time: 1 msec
> ;; SERVER: 0.0.0.15#53(216.249.24.15)
> ;; WHEN: Sat Feb 27 21:25:21 2010
> ;; MSG SIZE  rcvd: 500
> 
> net.                    172800  IN      NS      H.GTLD-SERVERS.net.
> net.                    172800  IN      NS      M.GTLD-SERVERS.net.
> net.                    172800  IN      NS      I.GTLD-SERVERS.net.
> net.                    172800  IN      NS      F.GTLD-SERVERS.net.
> net.                    172800  IN      NS      K.GTLD-SERVERS.net.
> net.                    172800  IN      NS      L.GTLD-SERVERS.net.
> net.                    172800  IN      NS      E.GTLD-SERVERS.net.
> net.                    172800  IN      NS      J.GTLD-SERVERS.net.
> net.                    172800  IN      NS      D.GTLD-SERVERS.net.
> net.                    172800  IN      NS      G.GTLD-SERVERS.net.
> net.                    172800  IN      NS      B.GTLD-SERVERS.net.
> net.                    172800  IN      NS      A.GTLD-SERVERS.net.
> net.                    172800  IN      NS      C.GTLD-SERVERS.net.
> ;; Query time: 14 msec
> ;; SERVER: 192.33.4.12#53(C.ROOT-SERVERS.NET)
> ;; WHEN: Sat Feb 27 21:25:21 2010
> ;; MSG SIZE  rcvd: 505
> 
> domain.com.              172800  IN      NS      ns1.domain.com.
> domain.com.          172800  IN      NS      ns2.domain.com.
> ;; Query time: 54 msec
> ;; SERVER: 192.55.83.30#53(M.GTLD-SERVERS.net)
> ;; WHEN: Sat Feb 27 21:25:26 2010
> ;; MSG SIZE  rcvd: 104
> 
> apps.domain.com.     300     IN      A       216.249.24.50
> domain.com.          86400   IN      NS      ns2.domain.com.
> domain.com.          86400   IN      NS      ns1.domain.com.
> ;; Query time: 0 msec
> ;; SERVER: 0.0.0.15#53(ns1.domain.com)
> ;; WHEN: Sat Feb 27 21:25:26 2010
> ;; MSG SIZE  rcvd: 120
> 
> 
> real    0m5.090s
> user    0m0.004s
> sys     0m0.004s
> 
> So since I finally caught one of these in the wild, I could look at
> the network trace. I was caught off guard when I saw "No such Name"
> "Flags: 0x8483 (Standard query response, No such name)"

It would help if you told us WHICH query elicited this response.

> 
> What? I can query my 4 servers (behind a Load balancer or through the
> LB) and the resolve fine, all are running, all have current zone files
> (they are slaves), so I don't understand the "no such name",  I have
> no idea why this server is giving this response. And since it's so
> infrequent it makes no sense at all. Servers are not busy, very low
> load, gig network, no saturation, no retransmissions, everything seems
> healthy.
> 
> So now I'm wondering if that's the 5 second delay, it sends out a
> request, one server sends back, no such name, so it queries my other
> set of dns servers and get's an immediate response. However all 4
> servers seem fine.
> 
> So I've sniffed the traffic, I've looked at what logs I have, is there
> other logs I can enable to catch, watch for this,  Is there a possible
> configuration that is out dated wrong?
> 
> New to the bind list so not clear what information will allow me to
> help you, help me.
> 
> I even thought that maybe I had a bad hint file named.cache file, but
> it appears to be current (well last major update seems to have been
> Dec 08).

Even if you did, one of the first things BIND does when it starts up is 
query a root server to get the current root server list, and this is 
used instead of the hints.

-- 
Barry Margolin, barmar at alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***



More information about the bind-users mailing list