Bind 8.2.3, query-restart on expired NS and A record

Wed Oct 10 23:17:56 UTC 2001

	The problem is you have very small TTL on the NS record and
	as BIND 4 and BIND 8 don't have query restart the record
	is timing out before the second query comes in.

	5 minutes would be stable.

	Mark

> 
> [Mark.Andrews at isc.org: Wed, Oct 10, 2001 at 10:19:13AM +1000]
> > 
> > > does a bind 8.2.3 stub resolver ever overwrite existing cache entries
> > > with records received from an additional section for the same record?
> > 
> > 	Well given that the stub resolver doesn't have a cache this
> > 	does not make sence the way it was written.
> 
> you and kevin are right here: I meant a recursive server. sorry for
> the confusion.
> 
> >	The nameserver does not refresh TTL based on answers it
> > 	receives (though earlier versions did creating server lock).
> 
> I think this is the source of my problem.. Bind 8 does not refresh
> TTLs even for expired records - expired records must first be
> explicitly deleted before a record with a fresh TTL will take their
> place.. I think this can be problematic with respect to glue
> records. I'll back it up with a bind trace in a second.
> 
> But the setting is this: the cache has a stale A record for
> gluetest.limey.net, it also has a stale NS record for
> gluetest.limey.net (that delegates to px.limey.net), and it also has a
> stale A record for px.limey.net. It has 2 fresh NS records for
> limey.net (delegating to sidehack.gweep.net and ayup.limey.net) as
> well as a fresh A records for sidehack and ayup.
> 
> [** First we find the A for gluetest.limey.net, figure out that it's
>     stale and delete it. **]
> req: found 'gluetest.limey.net' as 'gluetest.limey.net' (cname=0)
> stale: ttl 1002571133 -7 (x2)
> delete_all(0x80ef0e0:"gluetest" IN A)
> 
> [** The best we can do with what is fresh is to contact the limey.net
>      nameservers. They reply with an NS record and a glue record **]
> nslookup(nsp=0xbfbfeaf8, qp=0x810f000, "gluetest.limey.net")
> nslookup: NS "AYUP.limey.net" c=1 t=2 (flags 0x2)
> nslookup: NS "SIDEHACK.GWEEP.net" c=1 t=2 (flags 0x2)
> nslookup: 2 ns addrs total
> forw: forw -> [65.105.101.18].53 ds=4 nsid=1531 id=37409 18ms retry
> 4sec
> Response (USER NORMAL -) nsid=1531 id=37409
> gluetest.limey.net.     1m41s IN NS     px.limey.net.
> px.limey.net.           4m1s IN A       204.168.16.17
> rrextract: dname gluetest.limey.net type 2 class 1 ttl 100
> rrextract: dname px.limey.net type 1 class 1 ttl 100
> 
> [** Now, as I understand it, we check the extracted records against
>   the existing cache.. The NS records doesn't match anything (we just
>   deleted it, but px.limey.net matches an existing record. That record
>   is stale, but we pay no heed **]
> rrsetupdate: gluetest.limey.net
> rrsetcmp: no records in database
> rrsetupdate: gluetest.limey.net 0
> rrsetupdate: px.limey.net
> rrsetcmp: rrsets matched
> 
> [** we now write the NS to the cache.. but not the A. When I run this
>     same trace on a clean cache it writes both the NS and the A here. **]
> db_update(gluetest.limey.net, 0x810c1f8, 0x810c1f8, 0, 031, 0x80feca0)
> db_update: adding 0x810c1f8
> 
> [** we now return to the business of following that delegation - but 
>     the server can't find the px A record "wanted!" **]
> resp: nlookup(gluetest.limey.net) qtype=1
> resp: found 'gluetest.limey.net' as 'gluetest.limey.net' (cname=0)
> wanted(0x810c1f8, IN A) [IN NS]
> 
> we now need to start a query for px.limey.net - that triggers the
> query restart behavior and timeouts which started all of this in the
> first place.. The problem seems to be that in order to get a TTL
> updated you have to be explicitly deleted - and deletions only happen
> when the cache has already looked up stale data.. because glue records
> are really hints about future queries, they are not pre-emptively
> deleted.
> 
> -P
> 
--
Mark Andrews, Internet Software Consortium
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark.Andrews at isc.org