2 problems: "temporary name lookup failures" & updating TLD servers

Linda W. bind at tlinx.org
Mon Jul 5 01:39:41 UTC 2004


Jim Reid wrote:

>> <>"Linda" == Linda W <bind at tlinx.org> writes:
>> Linda> Problem #1)
>> Linda> Sometimes the TLD servers change. The old addr's may work
>> Linda> for a while, but eventually, the old TLD IP's are
>> Linda> decommissioned and I stop being able to resolve (like last
>> Linda> one to fail was .org which moved to some new DNS servers).
>
>This is why you shouldn't configure your name server to slave another
>zone without the knowledge of the zone owner.
>
Sorry, didn't mean to confuse you with practices of a decade ago.  I 
haven't
been slaved to any TLD's for some number of years.  They all closed up zone
transfers as the internet has grown and become commercialized.   Now, I try
to configure the server to cache the 'glue',

> It's also hard to find
>any convincing reason why you should slave .edu (or any other TLD for
>that matter). It doesn't really help anything. And as you've seen it
>creates unnecessary administrative problems that can easily be avoided
>if your name server had been configured properly.
>
My inquiry was to find out how others administered their servers now which
is different than how they might have been administered before the internet
went commercial a bit over a decade ago.

> The overwhelming
>majority of the world's name servers don't slave any TLD or even care
>about doing that and they work just fine. Think about that.
>
>    Linda> Should I have a separate file for each TLD such that it
>    Linda> would be automatically updated when new TLD servers came out?
>
>No, you should leave this well alone. Unless you're involved in the
>administration of a TLD you have no reason or justification for
>interfering in that.
>
Interfering?  Exactly what am I interfering with other than my own setup? 
I'm not a publicly available server.  I'm reading your tone as a bit harsh
when I'm just asking for current practices, your, appreciated advice, is
tainted, a bit, by _my_ perception_ of it as "you \lecturing\ me".  Please
forgive my misinterpretation if this was not intended.

> Or creating a special private little world for
>your name server. 
>
But my experience of reality is my private little world...:-)  Sorry, don't
mean to get existential on a bind list.

>Keep things simple. If it takes more than a page or
>two to document a name server's setup, operation & administration, the
>configuration is too complex.
>
>    Linda> Is it common/acceptable practice to make everything but the
>    Linda> root servers bind-writeable.  What about the list of
>    Linda> root-server IP's.
>
>It is always good sense to apply the basic security principle of least
>privilege. So if a name server only needs to read some zone file, the
>ownerships and access permissions of that file should be set accordingly.
>  
>
---
    Yeah -- at one point in time the servers for the TLD's of 'com, org,
net, edu and gov' were fairly constant and it was as safe to cache them as
much as it was the root servers.  This has been changing.  Thus methods
that used to work are not as workable now.  Thus my inquiry about others'
practices.  I used to have (still do) the root servers for com, org, net,
edu and gov kept in non-writeable root-owned config files.  But I think
you are indicating that this should no longer be practiced.  Since I
occasionally go through periods where I'm experimenting with new kernel
configs, I need to restart bind and would like it to keep it's cached
data, when possible, over reboots.  It falls into the category of "it
would be nice" (i.e. not necessary, but might save a few extra queries
after a reboot (?)).  I don't always do things the most straight forward
way because taking the long route can _sometimes_ enable me to learn more
(though it may take longer to get something working right).  Really
depends on how much spare/free time I have at any given time.
 

>    Linda> Problem # 2)
>
>    Linda> I've been noticing that I am getting the error "Temporary
>    Linda> failure in name resolution".  These "temporary failures can
>    Linda> exist for hours at a time which is why they are annoying.
>
>The DNS is not perfect. 
>
I'm aware of this, but this doesn't seem to be the case with the transient
error.I've noticed it once before in the previous week but never before
had I noticed this type of problem except when I had an external firewall
box configured with too-short a window for DNS response packets sent
out via UDP. 

The previous transient error disappeared a few hour later, though,
now, oddly enough, my dig +trace nepp.nasa.gov still fails.  I know it
works if I query one of my ISP's DNS servers, so I assume the lookup 
worked at some point in time or my ISP would not have been able to cache it.

>The problem you showed trying to resolve nepp.nasa.gov seems to be a
>transient or local problem. The name can be resolved just fine here
>right now. 
>
    Can you do a "dig +trace nepp.nasa.gov"?  Oddly enough, I cannot
do a "dig +trace" on nasa's name servers either:
# dig +trace nasans1.nasa.gov
; <<>> DiG 9.2.2 <<>> +trace nasans1.nasa.gov
;; global options:  printcmd
.                       440069  IN      NS      C.ROOT-SERVERS.NET.
.                       440069  IN      NS      D.ROOT-SERVERS.NET.
.                       440069  IN      NS      E.ROOT-SERVERS.NET.
.                       440069  IN      NS      F.ROOT-SERVERS.NET.
.                       440069  IN      NS      G.ROOT-SERVERS.NET.
.                       440069  IN      NS      H.ROOT-SERVERS.NET.
.                       440069  IN      NS      I.ROOT-SERVERS.NET.
.                       440069  IN      NS      J.ROOT-SERVERS.NET.
.                       440069  IN      NS      K.ROOT-SERVERS.NET.
.                       440069  IN      NS      L.ROOT-SERVERS.NET.
.                       440069  IN      NS      M.ROOT-SERVERS.NET.
.                       440069  IN      NS      A.ROOT-SERVERS.NET.
.                       440069  IN      NS      B.ROOT-SERVERS.NET.
;; Received 372 bytes from 127.0.0.1#53(127.0.0.1) in 2 ms

gov.                    172800  IN      NS      A.GOV.ZONEEDIT.COM.
gov.                    172800  IN      NS      B.GOV.ZONEEDIT.COM.
gov.                    172800  IN      NS      C.GOV.ZONEEDIT.COM.
gov.                    172800  IN      NS      D.GOV.ZONEEDIT.COM.
gov.                    172800  IN      NS      E.GOV.ZONEEDIT.COM.
gov.                    172800  IN      NS      F.GOV.ZONEEDIT.COM.
gov.                    172800  IN      NS      G.GOV.ZONEEDIT.COM.
;; Received 274 bytes from 192.33.4.12#53(C.ROOT-SERVERS.NET) in 44 ms

nasa.gov.               259200  IN      NS      nasans1.nasa.gov.
nasa.gov.               259200  IN      NS      NASANS3.nasa.gov.
nasa.gov.               259200  IN      NS      NASANS4.nasa.gov.
;; Received 140 bytes from 216.55.155.29#53(A.GOV.ZONEEDIT.COM) in 38 ms

dig: Couldn't find server 'nasans1.nasa.gov': Temporary failure in name 
resolution

>It's not possible to diagnose what went wrong for you from
>the info you provided.
>
Here is some more.  I can lookup nasa's NS's from one of the
.gov NS's:
# dig @a.gov.zoneedit.com nasans1.nasa.gov

; <<>> DiG 9.2.2 <<>> @a.gov.zoneedit.com nasans1.nasa.gov
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60362
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 3

;; QUESTION SECTION:
;nasans1.nasa.gov.              IN      A

;; AUTHORITY SECTION:
nasa.gov.               259200  IN      NS      nasans1.nasa.gov.
nasa.gov.               259200  IN      NS      NASANS3.nasa.gov.
nasa.gov.               259200  IN      NS      NASANS4.nasa.gov.

;; ADDITIONAL SECTION:
nasans1.nasa.gov.       86400   IN      A       192.77.84.32
NASANS3.nasa.gov.       86400   IN      A       198.116.144.49
NASANS4.nasa.gov.       86400   IN      A       198.116.144.33

;; Query time: 39 msec
;; SERVER: 216.55.155.29#53(a.gov.zoneedit.com)
;; WHEN: Sun Jul  4 17:39:20 2004
;; MSG SIZE  rcvd: 140
================
    But I cannot query one of Nasa's name servers which (?)should(?) be
authoritative for the nasa.gov domain and therefore should be query-able for
resolution of the name "napp.nasa.gov" (originally off a link from "/."):

# dig @nasans1.nasa.gov nasans1.nasa.gov 
dig: Couldn't find server 'nasans1.nasa.gov': Temporary failure in name 
resolution

    From my system, I don't seem to be able to query any of the
authoritative domain servers for the Nasa.gov domain.

    Oddly, if I use the IP for the server name (192.77.84.32), I am able to
get results for "nasans1.nasa.gov" (the name server itself):
# dig @192.77.84.32 nasans1.nasa.gov   

; <<>> DiG 9.2.2 <<>> @192.77.84.32 nasans1.nasa.gov
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33710
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 3

;; QUESTION SECTION:
;nasans1.nasa.gov.              IN      A

;; ANSWER SECTION:
nasans1.nasa.gov.       600     IN      A       192.77.84.32

;; AUTHORITY SECTION:
nasa.gov.               600     IN      NS      nasans1.nasa.gov.
nasa.gov.               600     IN      NS      NASANS4.nasa.gov.
nasa.gov.               600     IN      NS      NASANS3.nasa.gov.

;; ADDITIONAL SECTION:
nasans1.nasa.gov.       600     IN      A       192.77.84.32
NASANS4.nasa.gov.       600     IN      A       198.116.144.33
NASANS3.nasa.gov.       600     IN      A       198.116.144.49

;; Query time: 88 msec
;; SERVER: 192.77.84.32#53(192.77.84.32)
;; WHEN: Sun Jul  4 17:48:54 2004
;; MSG SIZE  rcvd: 156
=================
but querying for napp.nasa.gov gets me zero answers:
# dig @192.77.84.32 napp.nasa.gov  

; <<>> DiG 9.2.2 <<>> @192.77.84.32 napp.nasa.gov
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 49455
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;napp.nasa.gov.                 IN      A

;; AUTHORITY SECTION:
nasa.gov.               14400   IN      SOA     nasans1.nasa.gov. 
dnssupport.nasa.gov. 2004063000 10800 1200 3600000 14400

;; Query time: 86 msec
;; SERVER: 192.77.84.32#53(192.77.84.32)
;; WHEN: Sun Jul  4 17:49:13 2004
;; MSG SIZE  rcvd: 86

> All we know from what you said was that your
>local copy of dig was unable to resolve nasans1.nasa.gov. But that
>name resolves OK when I tried it a moment ago.
>  
>
Were you using the trace option?   I may not understand the use of dig,
but I thought that using the "+trace" option to dig caused it to
circumvent the local bind cache and send out the dns probes directly
to the network.  In the case of using the "@dns-server" notation,
I thought it also circumvented local copy of bind and sent the request
directly to the named server.  No?  Yes?

>    Linda> This "temporary failure" seems to be a fairly new/recent
>    Linda> phenomena as far as I can tell and has also happened with
>    Linda> some addresses in other TLD's, where it will trickle down
>    Linda> to the end point servers and just continue to come back
>    Linda> empty.
>
>These problems could be caused by your policy of trying to make your
>name server slave some TLDs. 
>
---
    As mentioned before, I discontinued that practice a few years ago -- 
but more importantly, my use of 'dig' with the "@server" or +trace 
option, I thought, should circumvent my local name server, shouldn't it?

>    Linda> Is there some way to configure bind to fail-over to doing a
>    Linda> lookup from my ISP rather than returning a failure with my
>    Linda> bind-server caching the answer from my ISP as a non-authoritative 
>    Linda> answer for whatever the expiration period is?
>
>No. When a lookup gets answered, that's game-over. 
>
----
    But this isn't a case of an authoritative lookup giving an answer. 
It's a _temporary_ lookup failure.  It's not the same as answering that 
the name does not exist.

Applications using resolver libraries are able to fail-over and try
looking up the name from the next name server in a "resolv.conf" (or
equivalent) file. 

It seems unfortunate that bind, itself, can't be configured to
transparently handle transient failures by using an alternate lookup
method.

In this situation, application lookup failures on the gateway/proxy
machine could work by putting one of my ISP's name servers
in the resolv.conf file, but for machines inside the network that don't
have direct external access, this wouldn't work.

    One kludge would be to "ifconfig" a second internal address on the
inside ethernet interface and run a separate copy of bind that does
forward/caching only to my ISP's servers.  But that seems a bit
kludgey.  Is that standard practice to provide redundancy in the lookup
process for interal lans?

    It could be real neat if the queries could be done in
parallel and the first to answer would be returned to the client.  That
would be worth running a second server for, as the internal machine
could always get the fastest responding name server and have some built-in
failover support, The primary NS would still fetch the address and glue
and as fast or faster on future lookups and with the glue, it might be
able to lookup related/"near" external names faster than a heavily loaded
ISP NS where the name wasn't in the ISP's NS cache.
 

>Resolution stops at
>that point. There's no way to tell a name server "if you get an answer
>I don't like, go and ask the query somewhere else".
>
---
    If it is an authoritative and non-temporary failure, I agree.
However, if it's a transient failure and the correct answer is cached at
another "nearby" NS, querying that alternate would be preferable to 
returning
a failure.  It is likely that it wouldn't always be best to query the
ISP's nameserver as it is likely to be slower due to latency and load
if the name isn't already in it's cache.

>    Linda> Ideas?  Suggestions?  
>
>Re-think your name server configuration. Your problems will probably
>go away if your server isn't getting forced into doing needlessly
>silly and pointless things. ie Scrap attempts to slave any TLD.
>
Done.

> And
>don't forward queries to anyone else's name servers. Make your name
>server resolve everything for itself, just like it's supposed to.
>  
>
Also done, but still not working for napp.nasa.gov.  :-(

Somehow, I don't think that redundancy is a bad thing.  Especially
when my queries aren't working and another NS's (like yours, apparently)
does.  I still don't understand why my "dig" isn't able to query nasa's
NS's. :-(

;-?

L.




More information about the bind-users mailing list