[bind10-dev] Nameserver Address Store and RFC 2308 - Comments Requested

Tue Nov 23 16:36:16 UTC 2010

I'm afraid this is a bit of a long email, but I'd like some input on the following thoughts about the Nameserver Address Store (NSAS):

I've just been re-reading RFC 2308 and came across the following in section 7.2:

   A resolver MAY cache a dead server indication.  If it does so it MUST
   NOT be deemed dead for longer than five (5) minutes.  The indication
   MUST be stored against query tuple <query name, type, class, server
   IP address> unless there was a transport layer indication that the
   server does not exist, in which case it applies to all queries to
   that specific IP address.

What implications does that have for the NSAS?  Although the NSAS is not the cache itself, it is related to it.

At present, the idea is that when the resolver receives a referral to another zone, it asks the NSAS for the address of a nameserver in that zone.  Based on round-trip times, the NSAS chooses one of the addresses associated with the nameservers for the zone and returns it to the resolver.  The resolver makes a query to that address and reports back to the NSAS the round-trip time (RTT).  The NSAS incorporates this information into its internal data store and uses it in the selection of an address the next time one is requested.  If the address does not respond, that fact is also recorded.  Periodically queries are directed to addresses marked unreachable or with a long RTT to see if their status has changed.

There seem to be a few implications if the resolver is going to cache dead server indications according to the RFC:

1) Five-Minute Caching
As envisaged, the resolver asks for the address of a nameserver for a zone.  It receives an address but not the name of the nameserver.  So the main cache can't record the fact that a server is unavailable, it has to be the NSAS.

If a server has one address associated with it, this is easy to implement - when the NSAS receives an indication that the address is unreachable, it marks it as such and notes the time.  All queries for an address for that server return "unreachable" for the next five minutes.  But what about the case when a nameserver has more than one address associated with it?  In this case although one address is unreachable, others may be reachable.  This means that the server as a whole is reachable and so the five-minute restriction does not apply.  Although we could still apply the five-minute rule (and it might be easiest to do so), it does allow the possibility that an unreachable address can be considered unreachable for longer than five minutes, so reducing query load.

2) Query Tuple
This suggests that a nameserver could respond for one particular <qname, qtype, qclass, IP Address> but not for another.

At present the NSAS distinguishes between different classes.  The resolver will request the address of a nameserver for a particular zone and class and the NSAS will look it up in its data store.  Any data for the same zone but different class will be ignored.  (In the rest of this discussion, assume that the class has already been taken into account when querying the NSAS.)

At present, the it is not intended that the resolver pass information about the query name or type to the NSAS.  However, the text in RFC 2308 seems to suggest that if we are going to cache dead server indications we need to do so.  It also suggests that associated with a server address, the NSAS needs to store both an RTT and a list of <name, type> for which the server is unreachable.  So when considering what address to return, the NSAS should check if an address is unreachable for the particular <name, type> for the intended query; if it is, the NSAS ignores it; if not, it needs to consider that address in its calculations.  A further complication appears to be that since the queries for different <name, type> will occur at different times, the five-minute window for each tuple will expire at different times.  This will add a bit to the complexity of the code.

Of course, the process is simplified if the transport layer determines that the address is unreachable as we can mark the address unreachable for all queries.  However I am assuming that in most cases it is the application layer that will do this by means of a query timeout.

The question I have is whether this interpretation of the implication of RFC 2308 on the NSAS is correct.  And if so, what is the best way to approach the issues raised here?

Stephen