BIND 10 #397: Port generic red-black tree (RBT) data structure from BIND-9

Tue Dec 7 05:43:07 UTC 2010

#397: Port generic red-black tree (RBT) data structure from BIND-9
------------------------------+---------------------------------------------
      Reporter:  zzchen_pku   |        Owner:  jinmei   
          Type:  enhancement  |       Status:  reviewing
      Priority:  major        |    Milestone:           
     Component:  data source  |   Resolution:           
      Keywords:               |    Sensitive:  0        
Estimatedhours:  0.0          |        Hours:  0        
      Billable:  1            |   Totalhours:  0        
      Internal:  0            |  
------------------------------+---------------------------------------------

Comment(by jinmei):

 Here are some major outstanding issues.  Some of them are simply not
 addressed since the previous comments; some others are a revised
 discussion based on the response.

  - this stuff should go to src/lib/datasrc
  - class/module/file name
 > I do not totally understand it, if what we will provide the class
 > memory data base, we should add another abstraction, so user don't
 > need to know what concrete data struct we use. but this part code is
 > red black tree porting
 >
 Actually, there are two issues here.  First: Whether we want to make the
 rbt stuff externally public or we want to keep it a private helper
 used only inside our .cc's.  I originally thought the former, but
 maybe we should at least begin with the latter approach, in which
 case this point may be less critical.  The second issue is that it
 still affects our internal modules that use this stuff.  Consider
 the ZoneTable class.  It will contain the "RBTree" and calls its
 find() method.  If we change the internal data structure from the
 red black tree based one (e.g. to a B-tree like thing), the class
 name "RBTree" is not correct and we'd like to change the interface
 to "BTree" or something.  Then we'll need to change the caller side
 of the code even if the interface and the provided service
 (i.e. giving a longest match for a given name) don't change.  From
 the user's point of view, the important point is the interface, not
 the internal data structure.  When naming something that can be
 used by other things than that "something", we should focus on the
 interface, not the internal implementation.  Finally, in any case,
 "_datasrc" in "rbt_datasrc.h" doesn't make sense anymore because
 the defined stuff in the header file isn't actually a "data source"
 itself.
  - boost noncopyable
 > I think we should depend less on boost compile part, but if only needs
 head file, I don't see the difference between depending on one file and
 more files.
 >
 No, the point is that boost is a moving target, and using it in a
 public header file increases the risk of introducing binary backward
 incompatibility.  More boost header files to be included, more risks
 we have.  So, basically, unless it's very difficult to find a
 reasonable alternative to the feature that a particular boost module
 provides, we shouldn't use it in a public header file.  So far, the
 only exception is shared_ptr.  Its benefit is too big, and due to
 its complexity it's very difficult to have a non boost alternative.
 A relatively more subtle exception is boost/function, which is
 included from two of our public header files.  Hopefully we can
 remove the dependency on it eventually, but for now we have it as a
 compromise.  We don't use any other boost header files in our public
 header files.  IN the case of noncopyable, it's just a syntax sugar,
 and its alternative is just a few line of additional C++ code.  It's
 way below the level of inevitable dependency.
 Another issue is overall code consistency.  If, for some reason, we
 agree on using boost noncopyable, we should use it consistently
 throughout the code.  Using it only in a single header file is
 confusing.  We should either use it for all noncopyable classes or
 not use it anywhere.
  - exception safety: changing new to object_pool doesn't solve the
    essential problem.  See, e.g. nodeFission().  if the code following
    the createNode() call throws an exception (and it can throw), the
    allocated resource will still be hanging inside the pool, without
    being used by anyone, until we destruct the entire pool.  The new
    code indeed "fixes" memory leak in the long run, but in practice
    the memory is leaking from the user's point of view.  For the code
    to be really exception safe, we should clean up all newly allocated
    resources immediately after exception.  object_pool is primarily
    for performance improvement, and the use of it may or may not be a
    good idea (although I'd defer the tuning later - making such a
    drastic change to an already big patch will make review harder),
    but I cannot be free from the exception safety issue regardless of
    the use of the pool.
  - A related point: again, whether or not using object_pools,
    returning an error code for the "no memory" condition is IMO not a
    good idea because it's not consistent with our error handling
    convention.  I already pointed it out in my previous review, and
    it's still outstanding.
  - NXRRSET vs the notion of "shadow": I pointed this out in my
    previous review as part of comment on findHelper().  It's not
    addressed yet.
   - suggestion: drop the idea of shadow; require type T be comparable
     with NULL; replace the logic that checks whether the node is
     checked to be shadow with checking T == NULL.  So, for example,
     instead of
 {{{
             if (!node->is_shadow_) {
                 *target = node;
                 ret = EXACTMATCH;
             }
 }}}
     in findHelper(), do this:
 {{{
             if (node->data_ != NULL) {
                 *target = node;
                 ret = EXACTMATCH;
             }
 }}}
     Note: this doesn't yet solve the NXRRSET case, but let's
     resolve this matter later in a future sprint.
  - documentation is still quite insufficient.  quoting my previous
    comments: "documentation in general: we need two types of
    documentation: one for API users, and the other for developers who
    try to read/modify the code..."
  - existing (minor) point that isn't addressed:
   - The diagram after the (RBTree) class definition: this should be
     included in the doxygen document.

-- 
Ticket URL: <http://bind10.isc.org/ticket/397#comment:25>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development