[bind10-dev] NSAS Using Authority/Additional Information?

Wed Dec 1 19:04:11 UTC 2010

On 30 Nov 2010, at 11:24, Michal 'vorner' Vaner wrote:

> It is possible. However, I do not think it makes much sense to put a timer on
> something that is expected to be resolved locally only. The network really needs
> a timeout, but having a timeout to run for quite anything in the system seems
> both like an overhead and more complicated thing (when the timer fires, I do not
> know in which state the request is currently, if it was already answered (this
> can actually happen if the callback queue is long)).
> 
> So it seems simpler not to need them. Or is there any real reason to have
> timeouts on everything?

There has to be a timeout somewhere, but where that is is implementation-dependent.  My example had the timeout in the NSAS->resolver calls.  But the code should work equally well if it is in the resolver->NSAS call.  At some point something times out and unwinds the chain of asynchronous function calls.

>> 9) The resolver receives the request for the A record of ns.example.net.  For this it requires the address of a nameserver in example.net. so sends another call to the NSAS for this information passing another "resolver callback" object.
>> 
>> [Note - this should be detectable.  A requirement on the resolver is that it does not issue multiple outstanding queries for the same information.  If the same logic is applied to NSAS callbacks, the duplicate call should be detected.  Such an optimisation affects the detail of what follows but not the outcome.]
> 
> That is not exactly a duplicate, one was for A, www.example.net, another for A,
> ns.example.net.

I perhaps didn't explain it as well as I could.  In both step 2 and step 9 the resolver asking for the address of the nameserver for example.net.  In step 2, in response to the query to the .net nameserver for www.example.net it has received a referral to ns.example.net and so attempts to get the address from the NSAS.  In step 9 that request has led to another request for the address of the example.net. nameserver.

> And I think we do want to allow the resolver to put as many
> callbacks it wants (otherwise it would need to code its own multiplexing of
> callbacks). Real duplicates will need to be detected by the resolver, not us
> (I guess the NSAS should not inspect the callbacks, they might be different than
> resolver callbacks anyway, in case of test at least).

I think it is up to the caller (resolver) to ensure that duplicates do not appear or, at least, if they do appear then they have no harmful effects.  The NSAS should just work through the callbacks and call each one.

> Hmm, you are right. And it still wouldn't solve the problem with two
> cross-referencing zones (each with a nameserver in the other one and not
> providing a glue).
> 
> Then we must think of something else then. But it still seems like an unclean
> idea to put a callback there. Maybe we will be able to detect some of the
> problems described here sooner. Or maybe they do not happen in real life much so
> we should care only not to leak memory in such cases.

I don't think it is unclean.  It simplifies each task - I call something and pass a callback object.  At some time later the callback is called.  In the rare cases that we get a misconfiguration, it doesn't really matter that the callback nesting is deep or takes a long time to execute.  We are only getting the penalty on an error.

> But the CACHE_ONLY could be used to fetch all the glue information at the
> creation time, failing for the not specified ones. Then the second round could
> get the rest remotely. When we do not have that one, we just pick 2 nameservers,
> ask for their IP addresses and wait for that. If we are unlucky, we might have
> chosen the two that are out of zone, so they trigger external query.  And we wait
> for them to answer before we ask for the glue in cache. But that's probably too
> soon to think about optimising this (anything real just has the glue anyway).

This could be a note for implementation.  The NSAS knows what zone is being asked for.  If the zone does not exist, a zone entry is created for it and the resolver asked for the nameservers it knows about.  If there are more than two, the NSAS could check whether any are in-zone and if so, ensure that they are the ones it selects.  The idea here is that if configured correctly, we know that the resolver will have the addresses of them in cache (the glue records) so eliminating the need for an external query.

>> This is I think the bit with the biggest uncertainty.  I presume that the cache will be able to detect when an NS RRset has changed so presumably it will only update the NSAS when this occurs.
> 
> The detection will be no problem I think. I'm afraid little bit about the
> notifications, since the nameserver might be already out of the LRU list and
> hash table and still used by a zone. So we either need to ignore this one, do
> something on TTL expiration or have some linking there.

I don't think this is a problem.  The NS RRset just contains a list of nameservers; if the zone contains any that are not in the RRset, drop them.  And if there nameservers in the RRset that are not in the zone, add them.  If any of the new nameservers correspond to an entry in the nameserver entry hash table link to that, else create a new entry.

>> The simplest thing to do would be to delete the zone entry and recreate it anew.  When not pointed to by any zone entries, nameserver entries in the NSAS will remain in existence until they fall off the end of the LRU table.  So if we delete a zone entry and recreate it, there is a good chance that we will reacquire the same nameserver entries and by implication the same address entries with the up to date RTT information.  (This chance is improved - but not certain - if we create the new zone entry first and replace it.  The reason it is not certain is because the nameserver entry could have been removed from both the hash table and LRU list and is only being kept in existence by the pointer from the zone entry object.)  If not, then the RTT information will have to be rebuilt from scratch.
> 
> I do not think we need to recreate it. Just renewing seems better, to preserve
> information. Sure, that's more code and we could just do the recreation for now.
> 
> Would it be a problem having LRU only for zones, and nameservers would be
> removed from the hash when no zone points to them? That way we have less
> nameserver entries than now (there would be no duplicities), we could always
> locate everything we have and we wouldn't have problems with losing information
> too soon.
> 
> The problem there would be to correctly remove them from the hash at the
> destructor, but some clever locking should solve it.

That is a thought but you would need to detect when the reference count is 2 (so the nameserver entry is pointed to only by the zone and the hash table).  The documentation for shared_ptr::use_count() says "use_count() is not necessarily efficient.  Use only for debugging and testing purposes, not for production code".

However I've just noticed that shared_ptr::unique() doesn't have that restriction (although it does say "... may be faster than use_count()" - note the word "may"). So providing you lock the hash table slot for the nameserver entry during deletion of the zone entry, you could create a (raw) pointer to the nameserver entry's shared_ptr object, remove it from the zone entry, then use the raw pointer to access the nameserver entry and remove it from the hash table if unique() returns true.

If this does work, as well as simplifying the code it would remove a bottleneck with the nameserver entry LRU list.

>> At this point though, it is perhaps worth considering one (related) change to the NSAS data store.  At present zones and nameservers are accessed via a hashtable, but addresses are pointed to solely by the nameserver entries.  So if an address is referenced by two or more nameservers there will be multiple (independent) entries for it in the NSAS.  (On reflection I see that there was an implicit assumption that multiple names pointing to an address are unlikely. This is because the most usual case with multiple names is that the names are CNAMEs for a single name that points to a single address.)  If we were to add addresses to their own hash table and LRU list (this should be a minimal change to the code - the LRU list and hash table classes are templates so should adapt easily to the Address Entry class) and check the hash table when adding an address, there will only ever be one entry in the NSAS for any given address.
> 
> Does it bring anything else than just not having duplicate addresses? I do not
> see much benefit there.

It means that the RTT information is more accurate, being the result of queries to multiple names.

> Just a guess, but I think that the assumption is not far from truth. Some
> statistics would be better. But if there are only few that have multiple, is it
> worth to bring other overhead to the usual case? Currently, they are in one
> array, which is consecutive in memory (vector is just an array inside). If we
> omit another lookup in the table for each address, we get another at last 2
> memory fetches on some random address per address (one for the address object,
> one for the reference count, shared_ptr templates must store them somewhere
> else), each one possibly with TLB miss, therefore each taking aproximately 100
> ticks of processor. That would be a speed penalty at last 1+n times current
> implementation, where n is the number of addresses.
> 
> Not to mention that it would be more complicated code (passing the hash tables
> and lists one level deeper trough the structure, for one).
> 
> In the case where it would be common for a nameserver to have multiple names, I
> would see the benefit. But then, would it be possible to link multiple names to
> single nameserver entry? I know, it is hard to guess that and current
> implementation of the hash table does not allow it.

I mentioned it because of the possibility; if it were common, then we would save time when the address object is created because we would access an existing structure instead of creating it from scratch.  But I take your point - let's leave it for now and mark it as something we could look at when investigating performance.

>>> Another problem is, we assume that resolver is willing to provide data that it knows is unauthoritative. But this can be solved simply by adding UNATHORITATIVE_OK flag.
>> 
>> I think that most resolvers will do that anyway; if it does not have authoritative data, it will return any data is has.
> 
> Even without trying to reach authoritative source? I think they shouldn't.

The problem is exemplified by something like:

	example.net NS ns1.example.net
	ns1.example.net A 1.2.3.4

... in the parent.  That data is not authoritative but without using that information, you can't access the authoritative data.

> 
>>> This might be solved for example by providing some kind of cache cookies. When data are put into the cache, it would return a cookie and having such a cookie would guarantee that the cache is able to provide at last the data passed to it. (Technically, the easiest way to do this functionality is to put a shared pointer to the data into the cookie, and the cache would look first into itself, then into the cookie if not found.)
>> 
>> An interesting case.
>> 
>> Assuming glue were given but that the cache time is set to 0 no information is cached.  Therefore when the NSAS queries for the NS records for the zone in question, the resolver make an explicit query for the NS RRset. But after that we would be in the same situation as described above and queries would ultimately timeout.
> 
> Yes, that's why I want to propose the cookie. It would hold small piece of the
> cache and when passed with the request, cache would fallback there. The cookie
> would not be subject to the eviction, therefore 0-TTL data would survive, would
> be used just for the request that got the cookie and when dropped, the hold data
> would get reference count to 0.

That would work.  Providing the cache can be encapsulated as a single object, we could just create a (new, empty) instance of the cache to use as the cookie.

Stephen