BIND 10 #408: General control logic of NSAS

Mon Dec 6 16:19:53 UTC 2010

#408: General control logic of NSAS
---------------------------+------------------------------------------------
      Reporter:  vorner    |        Owner:  stephen  
          Type:  task      |       Status:  reviewing
      Priority:  major     |    Milestone:           
     Component:  recurser  |   Resolution:           
      Keywords:            |    Sensitive:  0        
Estimatedhours:  0.0       |        Hours:  0        
      Billable:  1         |   Totalhours:  0        
      Internal:  0         |  
---------------------------+------------------------------------------------
Changes (by vorner):

  * owner:  vorner => stephen

Comment:

 Hello

 I somehow refactored (or rewrote) the original code. It should be more
 correct, since I had better idea what I'm trying to accomplish.

 There are some TODO notes and the code would use some more tests, so it is
 not ready to be included in trunk yet. However, the changes tend to get
 large and I'm afraid me and Ocean are diverging with #408 and #356
 branches. So, what I would like to do is put this trough review and merge
 into #356. Then I would either branch again (I want to merge, not only
 sync, so Ocean has up to date code as well, I saw some commits in code
 that mostly disappeared in my branch) or, if he agrees, work on the same
 branch as him and add the tests and update documentation on wiki.

 I also propose to split these tasks off:
  * Optimise to run only 2 parallel queries at once, not query all zones at
 once (this should be easy one).
  * Add a flag to the ResolverInterface to ask it for cache data only
 (something like NO_REMOTE/CACHE_ONLY) and use it to request whatever
 there's in the cache before going to the 2 at a time.
  * Remove the LRU list from nameserver entry hash table, use weak pointers
 there and drop them when no zone references them.

 Furthermore, I'd like if anyone had an idea how to dispatch callbacks more
 safely. Because the callback might call functions of the class that
 dispatches them, its mutex must not be locked. But if I take all the
 callbacks out of the vector (so I do not touch unlocked vector), unlock
 and dispatch them one by one, if there's an exception, I lose the rest of
 them. I had two ideas how to solve it, but neither of them look really
 good:
  * If I catch an exception (or, well, from some guard object desctructor)
 and there are still some callbacks to go, lock again, put them back and
 then let the exception fly. This does not lose them, but there's a chance
 they will never be called again and that the locking and putting back
 might raise another exception, which is a problem (eg. uncatchable crash).
  * Modify the callback base class. When it is destroyed, it would call the
 failure() method. However, that is problematic, since it is called from
 destructor and the subclass destructor was already called at this point.
 It could be wrapped somehow (one class holding the callback and calling
 its failure() and then releasing it). But that sounds little bit
 complicated and there's the problem of a lot of unknown code running from
 a destructor, bringing the two exceptions at once problem.

 The branches/trac408 is currently at r3734. I'd like to have a review
 before I can merge it back to #356. And I think it might be easier to
 review the whole branch or the code instead of the changes, because a lot
 of the code was changed or deleted. However, I might add some tests in the
 meantime.

-- 
Ticket URL: <http://bind10.isc.org/ticket/408#comment:5>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development