bind 8 slow when resolving new domains!

Ronan Flood ronan at noc.ulcc.ac.uk
Tue May 11 13:03:56 UTC 2004


[also emailed to poster, who might have unsubbed by now -- apologies
if you see it twice]

I've looked into this a bit further out of interest as we also run
bind 8.3.7-REL on some systems.

On Thu, 06 May 2004 11:59:19 -0500, dap99 at i-55.com wrote:

> I am having a big problem with slow internal DNS (named 8.3.7-REL on
> FreeBSD 4.9). If we do a query against a local domain (our DNS server
> is authoratative) then the response is fast. If we do a query against
> anything in bind's cache the resp. is fast. If we do a query for a new
> non-local domain then the resp is SLOW or times-out. FYI, we are
> behind a NetScreen firewall at a colo. The colo promises it is not
> them. Also, we are using their two DNS servers as forwarders.

Are their DNS servers behind the Netscreen with you, or on the outside?
I think the Netscreen is at least partly to blame here.

> Okay, so what happens if I try to disable my forwarders?
> [...]
> So let's try a random domain name:
> 
> ns2# nslookup www.looser.com
> Server:  ns2
> Address:  192.168.42.78
> 
> *** ns2 can't find www.looser.com: Non-existent host/domain
> ns2# nslookup www.looser.com
> Server:  ns2
> Address:  192.168.42.78
> 
> Name:    www.looser.com
> Address:  217.8.158.117

I've tried a similar query on a freshly started 8.3.7-REL server under
Solaris8 with debugging turned on and packet snooping, which leads me
to the following comments on your trace.

> # tcpdump -n host ns2 and \( icmp or udp \)
> tcpdump: listening on rl0
> 10:13:50.515557 192.168.42.78.53 > 192.33.4.12.53:  21568 [1au] A?
> www.looser.com. (43)
> 10:13:50.562594 192.33.4.12.53 > 192.168.42.78.53:  21568- 0/13/14
> (475)

That's a query to c.root-servers.net and a response back with the
NS and A records of the 13 gTLD servers for .com, and an EDNS record.

> 10:13:50.563816 192.168.42.78.53 > 192.33.14.30.53:  39445 [1au] A?
> www.looser.com. (43)
> 10:13:50.619570 192.33.14.30.53 > 192.168.42.78.53:  39445 FormErr-
> [0q]
> 0/0/0 (12) (DF)

A query to b.gtld-servers.net, and an error back objecting to the
EDNS record in the query.

> 10:13:50.619641 192.168.42.78.53 > 192.33.14.30.53:  39445 A?
> www.looser.com. (32)

A fast-retry of the same query to the same server without the
EDNS record: no "[1au]" and the size is 32 instead of 43 bytes.
This query, or the response to it, appears to be blocked.

> 10:13:58.018699 192.168.42.78.53 > 192.55.83.30.53:  39445 [1au] A?
> www.looser.com. (43)
> 10:13:58.249039 192.55.83.30.53 > 192.168.42.78.53:  39445 FormErr-
> [0q]
> 0/0/0 (12) (DF)
> 10:13:58.249153 192.168.42.78.53 > 192.55.83.30.53:  39445 A?
> www.looser.com. (32)

After the 8-second timeout try again, this time with
m.gtld-servers.net.  Same scenario with the EDNS record,
and again the second query or reply is blocked.

> 10:14:06.018825 192.168.42.78.53 > 192.41.162.30.53:  39445 [1au] A?
> www.looser.com. (43)
> 10:14:06.051960 192.41.162.30.53 > 192.168.42.78.53:  39445 FormErr-
> [0q]
> 0/0/0 (12) (DF)
> 10:14:06.052112 192.168.42.78.53 > 192.41.162.30.53:  39445 A?
> www.looser.com. (32)

Again 8 seconds later, with l.gtld-servers.net.


> 10:14:09.431353 192.168.42.78.53 > 192.33.14.30.53:  7462 A?
> www.looser.com. (32)
> 10:14:09.489141 192.33.14.30.53 > 192.168.42.78.53:  7462- 0/2/2 (109)
> (DF)

Now reacting to your second query with nslookup, we go back to
b.gtld-servers.net with a new query-id (7462), and without
EDNS: bind remembers that b.gtld-servers.net doesn't support it.
Response back with NS records for looser.com.

> 10:14:09.489528 192.168.42.78.53 > 64.247.9.98.53:  56483 [1au] A?
> www.looser.com. (43)
> 10:14:09.544852 64.247.9.98.53 > 192.168.42.78.53:  56483*- 1/2/1 A
> 217.8.158.117 (104) (DF)

Query to ns2.zoneedit.com, and an authoritative A record back.


Meanwhile 8 seconds after the last failure, query 39445 is up
again, this time to i.gtld-servers.net, usual scenario:

> 10:14:14.018941 192.168.42.78.53 > 192.43.172.30.53:  39445 [1au] A?
> www.looser.com. (43)
> 10:14:14.160251 192.43.172.30.53 > 192.168.42.78.53:  39445 FormErr-
> [0q]
> 0/0/0 (12) (DF)
> 10:14:14.160333 192.168.42.78.53 > 192.43.172.30.53:  39445 A?
> www.looser.com. (32)

[... trace snipped ...]

And this continues at 8-second intervals with h.gtld-servers.net,
g.gtld-servers.net, d.gtld-servers.net, k.gtld-servers.net,
e.gtld-servers.net.

> Any ideas? Also, why so many FormErr (am I sending out bunk DNS
> queries?). This is a stock DNS install. I have the same problem on
> another identical FreeBSD/DNS server.

Bind 8.3.7-REL uses EDNS, and the gTLD servers at least don't
support that, but the problem looks like Netscreen is in some
way preventing the retries from working normally.  A search on
this turns up an earlier comment in comp.protocols.dns.bind

  http://groups.google.com/groups?&selm=b3g4ac%243v1e%241%40isrv4.isc.org

which references this discussion

  http://lists.insecure.org/lists/firewall-wizards/2003/Feb/0025.html

If your ISP's nameservers are also behind the Netscreen it could
be affecting their traffic in a similar way: from your earlier
trace it looks like their servers support EDNS.  Or the Netscreen
might be interfering with traffic between your nameserver and theirs.

Not sure what you can do if they can't sort Netscreen out.
You can't tell bind not to use EDNS globally, but it will learn
which servers don't support it after the first attempt.  However
the Netscreen might also be interfering with normal retries in
cases of packet-loss.

-- 
                      Ronan Flood <R.Flood at noc.ulcc.ac.uk>
                        working for but not speaking for
             Network Services, University of London Computer Centre
     (which means: don't bother ULCC if I've said something you don't like)


More information about the bind-users mailing list