[bind10-dev] Nameserver Address Store and RFC 2308 - Comments Requested

Tue Nov 23 17:43:22 UTC 2010

Stephen,

I get a slightly different meaning when I read 7.2.

In the case where there is a host not reachable or no listener on the 
port error, the fact is stored in the NSAS with the address and the 5 
minute timer is set. In the case that there is no response, the NSAS is 
not updated and instead there is a negative cache entry for qname, 
qclass, qtype tuple.

This is to catch the case where a name server that implements policy on 
certain queries by not responding and not have that take the server out 
of the active list. This is a bad way to do things, but people decided 
that it had to be supported. The challenge with this is what if the case 
is that there is a firewall that is dropping all your queries. In this 
case you are filling up your cache with lots of negative cache entries.

One question is what is the initial state of an NSAS entry before the 
first query is made? Since this will not be updated in the case of a 
dropped query, this state will remain, possibly forever. While you can't 
remove it from the active list, you can lower the likelihood that it is 
selected and still be within the spec. So adding 10ms to the rtt for 
each drop would be acceptable. If all the mane servers implement the 
same policy, all the times will be increased and the selection will not 
be changed.

I am also not sure about the issue of timer extension issue. The dead 
server can be seen either as applying to name server names or IP 
addresses. The spec is silent on which of these an implementer chooses. 
Given that the items that it can trigger this are at the transport 
layer, a case can be made that IP level timers make more sense.

jerry

On 11/23/2010 8:36 AM, Stephen Morris wrote:
> I'm afraid this is a bit of a long email, but I'd like some input on the following thoughts about the Nameserver Address Store (NSAS):
>
> I've just been re-reading RFC 2308 and came across the following in section 7.2:
>
>     A resolver MAY cache a dead server indication.  If it does so it MUST
>     NOT be deemed dead for longer than five (5) minutes.  The indication
>     MUST be stored against query tuple<query name, type, class, server
>     IP address>  unless there was a transport layer indication that the
>     server does not exist, in which case it applies to all queries to
>     that specific IP address.
>
> What implications does that have for the NSAS?  Although the NSAS is not the cache itself, it is related to it.
>
> At present, the idea is that when the resolver receives a referral to another zone, it asks the NSAS for the address of a nameserver in that zone.  Based on round-trip times, the NSAS chooses one of the addresses associated with the nameservers for the zone and returns it to the resolver.  The resolver makes a query to that address and reports back to the NSAS the round-trip time (RTT).  The NSAS incorporates this information into its internal data store and uses it in the selection of an address the next time one is requested.  If the address does not respond, that fact is also recorded.  Periodically queries are directed to addresses marked unreachable or with a long RTT to see if their status has changed.
>
> There seem to be a few implications if the resolver is going to cache dead server indications according to the RFC:
>
> 1) Five-Minute Caching
> As envisaged, the resolver asks for the address of a nameserver for a zone.  It receives an address but not the name of the nameserver.  So the main cache can't record the fact that a server is unavailable, it has to be the NSAS.
>
> If a server has one address associated with it, this is easy to implement - when the NSAS receives an indication that the address is unreachable, it marks it as such and notes the time.  All queries for an address for that server return "unreachable" for the next five minutes.  But what about the case when a nameserver has more than one address associated with it?  In this case although one address is unreachable, others may be reachable.  This means that the server as a whole is reachable and so the five-minute restriction does not apply.  Although we could still apply the five-minute rule (and it might be easiest to do so), it does allow the possibility that an unreachable address can be considered unreachable for longer than five minutes, so reducing query load.
>
> 2) Query Tuple
> This suggests that a nameserver could respond for one particular<qname, qtype, qclass, IP Address>  but not for another.
>
> At present the NSAS distinguishes between different classes.  The resolver will request the address of a nameserver for a particular zone and class and the NSAS will look it up in its data store.  Any data for the same zone but different class will be ignored.  (In the rest of this discussion, assume that the class has already been taken into account when querying the NSAS.)
>
> At present, the it is not intended that the resolver pass information about the query name or type to the NSAS.  However, the text in RFC 2308 seems to suggest that if we are going to cache dead server indications we need to do so.  It also suggests that associated with a server address, the NSAS needs to store both an RTT and a list of<name, type>  for which the server is unreachable.  So when considering what address to return, the NSAS should check if an address is unreachable for the particular<name, type>  for the intended query; if it is, the NSAS ignores it; if not, it needs to consider that address in its calculations.  A further complication appears to be that since the queries for different<name, type>  will occur at different times, the five-minute window for each tuple will expire at different times.  This will add a bit to the complexity of the code.
>
> Of course, the process is simplified if the transport layer determines that the address is unreachable as we can mark the address unreachable for all queries.  However I am assuming that in most cases it is the application layer that will do this by means of a query timeout.
>
> The question I have is whether this interpretation of the implication of RFC 2308 on the NSAS is correct.  And if so, what is the best way to approach the issues raised here?
>
> Stephen
>
>
> _______________________________________________
> bind10-dev mailing list
> bind10-dev at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind10-dev