[bind10-dev] NSAS Using Authority/Additional Information?

Mon Nov 29 17:27:02 UTC 2010

On 27 Nov 2010, at 20:17, Michal 'vorner' Vaner wrote:

> Hello
> 
> As the original question probably led to this, I wrote a proposal how to use
> resolver to get data for the NSAS. If you want to have a look at it and tell me
> if it makes sense (or, if my requirements for the rest of the system are sane),
> it is here (linked from the nameserver address store design page):
> 
> https://bind10.isc.org/wiki/NSASReqIdeas
> 
> If it makes sense, I'll incorporate it into the main NSAS page. And, more
> importantly, change the code (or, not really change, just continue writing with
> this in mind).

Michal

I've taken quotes from https://bind10.isc.org/wiki/NSASReqIdeas and commented on them:

> The recursive problem
> 
> The usual situation is that the one asking us to provide an IP address is the resolver. So it tells us to give it an IP address of some nameserver of example.net. Because we do not know example.net yet, we ask the resolver to give us NS records of example.net. And the resolver will ask us to provide the IP address of example.net, to ask what its nameservers are. This does not loop infinitely, as the example.net entry exists and is marked as IN_PROGRESS (waiting for data), so the callback is just stored. But we do not get any data and, worse, we do not timeout, because timeouts are on network operations, not on running code, and nothing here communicates by a network, so it does not create timeouts.

As I understand the asynchronous I/O we're using, there is no reason why you can't use separate timers.  So all requests could have a timeout associated with them.

> This might even lead to cyclic data structure with shared pointers, which will not get released.
> 
> The one saving the day is the cache. Assuming it stores anything that goes by, including data from additional and authoritative sections, the resolver does not ask us for IP address, but provides our answer directly from the cache. So this way it works in the usual situation.
> 
> Still, it is not bullet-proof. There might be a zone with single nameserver which does not have any IP address. In that case it is unreachable, but the cache can not assume anything from seeing empty additional section. So it does not know there are no IP addresses, so it will not provide them and the resolver will try to fetch them, asking us for IP address.

This should not be an issue - the system is designed to cope with it.  Let's assume this situation: a misconfigured parent zone with a single NS record:

   example.net NS ns.example.net

... and no glue record.

1) A lookup is made for www.example.net.
2) The resolver reads the NS record and caches it, then asks the NSAS for the address of a nameserver in example.net, passing with the request a "resolver callback' object.
3) "example.net" does not exist in the NSAS so a zone entry is created for it.
4) Within the NSAS, the "resolver callback" object is added to the zone entry'a callback list.
5) A request is then made to the resolver for the NS records for example.net, a callback object being specified. 
6) In this case the request completes immediately (with the example ns.example.net) and the callback is executed.
7) A nameserver object is created for ns.example.net.  The zone asks the nameserver object for its address, passing to it a "zone callback" object.  A timer is also started and associated with the query.
8) The "zone callback" object is added to the internal nameserver object callback list and a request for the A record sent to the resolver.
9) The resolver receives the request for the A record of ns.example.net.  For this it requires the address of a nameserver in example.net. so sends another call to the NSAS for this information passing another "resolver callback" object.

[Note - this should be detectable.  A requirement on the resolver is that it does not issue multiple outstanding queries for the same information.  If the same logic is applied to NSAS callbacks, the duplicate call should be detected.  Such an optimisation affects the detail of what follows but not the outcome.]

10) When the NSAS receives this request, it can see that a query is in progress for the example.net zone (as there is already a "resolver callback" associated with it), so it just queues the "resolver callback" object to the zone entry's callback list and returns.

At this point we have two "resolver callbacks" queued to the zone entry and one "zone callback" queued to the nameserver entry but nothing happening.  However:

10) At some later time the request queued in (7) times out (We assume here that calls from the resolver to the NSAS do not have a timeout associated with them but calls from the NSAS to the resolver do.)
11) The "zone callback" associated with the nameserver object is removed from that object's callback queue and is executed.
11) The "zone callback" object executes all the ""resolver callbacks" associated with the zone (in this case, those queued at steps 4 and 10), passing to them a "timeout" status.  As each callback is executed it is removed from the queue.
12) Request 4 will cause a SERVFAIL to be returned to the remote client.  Request 10 will cause a timeout code to be returned to request 8.  This will cause the all "zone callback" objects queued to the nameserver entry to be executed.  But there aren't any - the only one was removed in step 11 - so the response gracefully disappears.

> 
> This can be solved by providing a CACHE_ONLY flag to the resolver (assuming it will have one), forcing it not asking anything remote and provide fail right away if the cache does not have the data.
> 
> Such flag would allow us to do a first-round over the nameservers and fill the IP addresses we already have right in the initialization, then start fetching at most 2 IP NSs at once externally.

This won't work if the nameservers for a zone are in different zones.  Suppose the NS records are:

  example.net. NS ns1.example.com.
  example.net. NS ns1.example.org.

We may well not have the A records for either of the nameservers, but since the path does not involve example.net, we are able to get them without a deadlock. (Presumably: it is entirely possible than in getting the A records for these nameservers we end up with a path of referrals that passes through example.net.  In which case we have another deadlock.  As before, eventually something times out and causes the network of queries to unwind.)

> When we create the data structure, cache might know only unauthoritative data. But when the nameserver is queried, some authoritative data will arrive and the cache overwrites the unauthoritative by authoritative. But we don't, we still use the old one.
> 
> What is needed is that cache informs us about it. It is enough that we are informed when the data actually change (most of the time the unauthoritative and authoritative data will be the same and we do not need to be bothered). When we are informed, the simplest thing to do is pretend the entry expired and we will fetch new data from cache when they are needed.

This is I think the bit with the biggest uncertainty.  I presume that the cache will be able to detect when an NS RRset has changed so presumably it will only update the NSAS when this occurs.

The simplest thing to do would be to delete the zone entry and recreate it anew.  When not pointed to by any zone entries, nameserver entries in the NSAS will remain in existence until they fall off the end of the LRU table.  So if we delete a zone entry and recreate it, there is a good chance that we will reacquire the same nameserver entries and by implication the same address entries with the up to date RTT information.  (This chance is improved - but not certain - if we create the new zone entry first and replace it.  The reason it is not certain is because the nameserver entry could have been removed from both the hash table and LRU list and is only being kept in existence by the pointer from the zone entry object.)  If not, then the RTT information will have to be rebuilt from scratch.

At this point though, it is perhaps worth considering one (related) change to the NSAS data store.  At present zones and nameservers are accessed via a hashtable, but addresses are pointed to solely by the nameserver entries.  So if an address is referenced by two or more nameservers there will be multiple (independent) entries for it in the NSAS.  (On reflection I see that there was an implicit assumption that multiple names pointing to an address are unlikely. This is because the most usual case with multiple names is that the names are CNAMEs for a single name that points to a single address.)  If we were to add addresses to their own hash table and LRU list (this should be a minimal change to the code - the LRU list and hash table classes are templates so should adapt easily to the Address Entry class) and check the hash table when adding an address, there will only ever be one entry in the NSAS for any given address.

> Another problem is, we assume that resolver is willing to provide data that it knows is unauthoritative. But this can be solved simply by adding UNATHORITATIVE_OK flag.

I think that most resolvers will do that anyway; if it does not have authoritative data, it will return any data is has.

> TTL 0, cache eviction
> 
> We assume that we find the glue data in cache. But that might not happen, for example when some other thread needed to clean some space there (there isn't infinite amount of space in the cache usually) or the data has TTL 0 and it can't be kept in the cache. This would lead to not answering the query, but marking the entry as unreachable, which isn't correct.
> 
> This might be solved for example by providing some kind of cache cookies. When data are put into the cache, it would return a cookie and having such a cookie would guarantee that the cache is able to provide at last the data passed to it. (Technically, the easiest way to do this functionality is to put a shared pointer to the data into the cookie, and the cache would look first into itself, then into the cookie if not found.)

An interesting case.

Assuming glue were given but that the cache time is set to 0 no information is cached.  Therefore when the NSAS queries for the NS records for the zone in question, the resolver make an explicit query for the NS RRset. But after that we would be in the same situation as described above and queries would ultimately timeout.

> External assumption
> 
> This approach assumes few things about other components' behaviour. They are listed here, at a single place, for faster reference:
> 
> 	• Cache needs to intercept every packet that goes in and store all information, including additional and authority sections (might be out of the main store, using cookies).

It should so this anyway.  Bear in mind that the nameserver may have to answer explicit queries for NS information or for the A records of nameservers.

> 	• Cache needs to inform NSAS when unauthoritative or old information is replaced by different authoritative (eg. only when they differ).

Ack - discussed above.

> 	• Cache needs to be able to provide a way to store TTL and make it available to one exact NSAS query only. For example by cookies.

I don't think it needs to do this.  As you pointed out elsewhere, if we do store the TTL in the NSAS, the overhead is a single comparison for expiry time.  So the TTL can be passed with any RRset data.

> 	• Resolver needs to be able to handle flags CACHE_ONLY (not doing any external queries, if there isn't the information, then fail) and UNAUTHORITATIVE_OK (it is acceptable to receive not authoritative data)
> 	• Resolver needs to be able to pass the cache cookie with the request back to cache (if the cookies are used).

See above.

> 	• Resolver interface should provide a way to ask for an RRset. It is not really required, but passing the RRset is probably better than constructing a response and then parsing it again.

Absolutely!  The resolver interface should also allow for asynchronous I/O.

Stephen