[bind10-dev] Nameserver Address Store and RFC 2308 - Comments Requested

Wed Nov 24 10:56:11 UTC 2010

On 23 Nov 2010, at 17:43, Jerry Scharf wrote:

> Stephen,
> 
> I get a slightly different meaning when I read 7.2.
> 
> In the case where there is a host not reachable or no listener on the port error, the fact is stored in the NSAS with the address and the 5 minute timer is set. In the case that there is no response, the NSAS is not updated and instead there is a negative cache entry for qname, qclass, qtype tuple.
> 
> This is to catch the case where a name server that implements policy on certain queries by not responding and not have that take the server out of the active list. This is a bad way to do things, but people decided that it had to be supported. The challenge with this is what if the case is that there is a firewall that is dropping all your queries. In this case you are filling up your cache with lots of negative cache entries.

That does seem a better interpretation than mine - it also makes things a lot easier.

> One question is what is the initial state of an NSAS entry before the first query is made? Since this will not be updated in the case of a dropped query, this state will remain, possibly forever. While you can't remove it from the active list, you can lower the likelihood that it is selected and still be within the spec. So adding 10ms to the rtt for each drop would be acceptable. If all the mane servers implement the same policy, all the times will be increased and the selection will not be changed.

That's a good idea. In summary then:

* In the case of a transport error, "address unreachable" is reported to the NSAS and the address is dropped from the pool of available addresses for five minutes.
* In the case of an application timeout, "query timeout" is reported to the NSAS and the RTT for that address is raised by some fixed amount.

(It does occur to me that another benefit of this approach is that a single lost query or response does not immediately make the address unreachable, making the system is more robust to transient interruptions.)

Thinking out loud...

To get round the problem of periodically retrying addresses with a high RTT the thought was that queries would be sent to all addresses, the proportion sent to a given address being weighted according to some function of the RTT (with lower RTTs receiving a higher proportion of the queries).

If there is a timeout, the retry logic has the option of retrying a query to the same address or requesting another nameserver address for the same zone.  If it does the latter, the above address selection logic means that there is a chance of it getting the same address.  Although the outage might be transient, given that we know that a query to that address failed and given (assuming) there are other available addresses, it seems to make sense to prefer one of these over the one we have just used.  So the API for getting an address needs to allow the caller to provide a "don't give me this address" argument.

Such an argument also allows the tuple information held in the cache to be communicated to the NSAS.  When making a query for a particular <qname, qtype, qclass>, the list of associated IP addresses held in the negative cache could be passed in order to affect the address selection.  (So the argument should really be named "don't give me these addresses".)

However, there is probably a difference of interpretation here.  In the case of requesting an address for a retry, the argument is a hint; if there is only one available address - albeit one which has had its RTT raised because a query just failed - the NSAS should return it.  In the case of negative cache information, the argument is an instruction: if there are no other addresses available, the NSAS should return that fact to the caller.

Finally, raising the RTT in response to a timeout raises the question of whether there should be a maximum RTT and that if an address reaches this maximum it is declared unreachable.  It will mean that an error response can be returned more quickly.

> I am also not sure about the issue of timer extension issue. The dead server can be seen either as applying to name server names or IP addresses. The spec is silent on which of these an implementer chooses. Given that the items that it can trigger this are at the transport layer, a case can be made that IP level timers make more sense.

I think you are right.  The spec does say "dead server" but then goes on to state that the caching is per IP address.  There seems to be an implication that one server name = one IP address.  In practice, although nameservers are an element of the NSAS, as far as the user of the NSAS is concerned they are invisible; instead, for each zone there are multiple server addresses.  The easiest implementation will be for an address marked as unreachable to be regarded as unreachable for five minutes.

Thanks for your insights.

Stephen

> 
> On 11/23/2010 8:36 AM, Stephen Morris wrote:
>> I'm afraid this is a bit of a long email, but I'd like some input on the following thoughts about the Nameserver Address Store (NSAS):
>> 
>> I've just been re-reading RFC 2308 and came across the following in section 7.2:
>> 
>>    A resolver MAY cache a dead server indication.  If it does so it MUST
>>    NOT be deemed dead for longer than five (5) minutes.  The indication
>>    MUST be stored against query tuple<query name, type, class, server
>>    IP address>  unless there was a transport layer indication that the
>>    server does not exist, in which case it applies to all queries to
>>    that specific IP address.
>> 
>> What implications does that have for the NSAS?  Although the NSAS is not the cache itself, it is related to it.
>> 
>> At present, the idea is that when the resolver receives a referral to another zone, it asks the NSAS for the address of a nameserver in that zone.  Based on round-trip times, the NSAS chooses one of the addresses associated with the nameservers for the zone and returns it to the resolver.  The resolver makes a query to that address and reports back to the NSAS the round-trip time (RTT).  The NSAS incorporates this information into its internal data store and uses it in the selection of an address the next time one is requested.  If the address does not respond, that fact is also recorded.  Periodically queries are directed to addresses marked unreachable or with a long RTT to see if their status has changed.
>> 
>> There seem to be a few implications if the resolver is going to cache dead server indications according to the RFC:
>> 
>> 1) Five-Minute Caching
>> As envisaged, the resolver asks for the address of a nameserver for a zone.  It receives an address but not the name of the nameserver.  So the main cache can't record the fact that a server is unavailable, it has to be the NSAS.
>> 
>> If a server has one address associated with it, this is easy to implement - when the NSAS receives an indication that the address is unreachable, it marks it as such and notes the time.  All queries for an address for that server return "unreachable" for the next five minutes.  But what about the case when a nameserver has more than one address associated with it?  In this case although one address is unreachable, others may be reachable.  This means that the server as a whole is reachable and so the five-minute restriction does not apply.  Although we could still apply the five-minute rule (and it might be easiest to do so), it does allow the possibility that an unreachable address can be considered unreachable for longer than five minutes, so reducing query load.
>> 
>> 2) Query Tuple
>> This suggests that a nameserver could respond for one particular<qname, qtype, qclass, IP Address>  but not for another.
>> 
>> At present the NSAS distinguishes between different classes.  The resolver will request the address of a nameserver for a particular zone and class and the NSAS will look it up in its data store.  Any data for the same zone but different class will be ignored.  (In the rest of this discussion, assume that the class has already been taken into account when querying the NSAS.)
>> 
>> At present, the it is not intended that the resolver pass information about the query name or type to the NSAS.  However, the text in RFC 2308 seems to suggest that if we are going to cache dead server indications we need to do so.  It also suggests that associated with a server address, the NSAS needs to store both an RTT and a list of<name, type>  for which the server is unreachable.  So when considering what address to return, the NSAS should check if an address is unreachable for the particular<name, type>  for the intended query; if it is, the NSAS ignores it; if not, it needs to consider that address in its calculations.  A further complication appears to be that since the queries for different<name, type>  will occur at different times, the five-minute window for each tuple will expire at different times.  This will add a bit to the complexity of the code.
>> 
>> Of course, the process is simplified if the transport layer determines that the address is unreachable as we can mark the address unreachable for all queries.  However I am assuming that in most cases it is the application layer that will do this by means of a query timeout.
>> 
>> The question I have is whether this interpretation of the implication of RFC 2308 on the NSAS is correct.  And if so, what is the best way to approach the issues raised here?
>> 
>> Stephen
>> 
>> 
>> _______________________________________________
>> bind10-dev mailing list
>> bind10-dev at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/bind10-dev
> 
> _______________________________________________
> bind10-dev mailing list
> bind10-dev at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind10-dev