BIND 10 #397: Port generic red-black tree (RBT) data structure from BIND-9

Fri Dec 10 06:54:08 UTC 2010

#397: Port generic red-black tree (RBT) data structure from BIND-9
------------------------------+---------------------------------------------
      Reporter:  zzchen_pku   |        Owner:  hanfeng  
          Type:  enhancement  |       Status:  reviewing
      Priority:  major        |    Milestone:           
     Component:  data source  |   Resolution:           
      Keywords:               |    Sensitive:  0        
Estimatedhours:  0.0          |        Hours:  0        
      Billable:  1            |   Totalhours:  0        
      Internal:  0            |  
------------------------------+---------------------------------------------
Changes (by jinmei):

  * owner:  jinmei => hanfeng

Comment:

 The latest code looks quite good, but I have still some substantial
 comments.  See below.

 I've made some minor editorial fixes to the branch (r3788).

 Documentation now looks much better, although I think we need some
 substantial cleanups and additions.  But for now let's focus on the
 code.

 '''namespace helper'''
  - this will cause duplicate definitions of the functions defined in
    the namespace, resulting in a linker error once more than one .cc
    includes this header file.
  - one possible approach would be to make these functions inline,
    and/or to only give declarations in .h and give the definitions in
    a separate .cc file.
  - in either case, I'm not very comfortable to make these functions
    publicly accessible.  putting them inside a separate namespace just
    obscures the existence and location of these essentially private
    functions.  It doesn't really prohibit user apps from using them.
    As a compromise, however, we limit the use of this header file for
    our internal purposes only (and document so), and keep "hiding"
    them by not installing the header file in a public space.  I'm okay
    with that approach for the purpose of this ticket.

 '''RBTreeColor'''
  - my previous comment hasn't been addressed:
    - RBTreeColor: a minor point, but why does it start with 1?

 '''RBNode class definition'''
  - shouldn't we make constructors private? (we previously did, and I
    think it made sense)
  - type of data_: I don't think it works well.  getting a pointer from
    a possibly different module and freeing it in the callee (=rbtree)
    is a bad practice, because the caller and callee may use different
    new/delete.  My suggestion is to impose requirement to T:
    - it's assignable and copy constructible (and should be exception free)
    - comparable to NULL
    This effectively means T must be a pointer(-like) type, and if it's
    a bare pointer the caller must be responsible for freeing it. (so,
    again, effectively it should be a smart pointer type)

 '''RBTree class definition'''
  - this comment is not really correct any more because findHelper()
    isn't called recursively:
 {{{
     /// Each public function has related recursive helper function
 }}}
  - why did you remove const from 'up'?  This essentially changed two
    things: changing the type for 'up' from RBTree to RBNode, and
    removing the const.  I see the reason for the former.  I don't for
    the latter.
  - Is getNameCount() (and name_count_) really necessary?  After all,
    the notion of number of "explicitly added names" is moot, because
    our interest is whether a name has data or not, rather than how it
    appears in the tree.  I suggest we remove name_count_ and
    getNameCount().

 '''RBTree<T>::findHelper'''
  - Note: up_node is currently unused.  I'm okay with keeping it for
    now though because we'll probably need it soon.  but please leave
    comments about that.
  - This use of 'using namespace' may be okay, but I'd like to be
    conservative and avoid taking a risk of using it in a header file.
    in any case note that we cannot use a namespace here (see above).
  - this assumption doesn't hold:
 {{{
                 if (node->isEmpty()) {
                     assert(node->down_ != NULL);
 }}}
   Consider the following simple case:
 {{{
     RBTree<int> rbtree;
     rbtree.insert(Name("a"), &rbtnode);
     rbtree.find(Name("b.a"), &rbtnode);
 }}}
   AFAICS we can (should) simply merge empty/non empty cases.
  - BTW, maybe shouldn't we use null node also for the down pointer?
  Then this code could be simplified
 {{{
                     if (node->down_ != NULL) {
                         node = node->down_;
                     } else {
                         break;
                     }
 }}}
   to this:
 {{{
                     node = node->down_;
 }}}

 '''RBTree<T>::insert'''
  - related to whether we need name_count_ (see above), I don't see the
    benefit of changing the behavior in the "already exist" case based
    on whether the node is "empty" or not:
 {{{
             if (current->isEmpty()) {
                 ++name_count_;
                 return (SUCCEED);
             } else {
                 return (ALREADYEXIST);
             }
 }}}
   I'd drop the notion of name_count_ and merge these cases:
 {{{
             return (ALREADYEXIST);
 }}}
   IMO this is much simpler and makes the code much robust (for cases
   like we explicitly inserted a node and then set/reset the data).
  - the purpose of using auto_ptr is probably not clear.  please add
    comments on why need to do this.

 '''nodeFission'''
  - down_node initialization could be simplified (and more efficient):
 {{{
     std::auto_ptr<RBNode<T> > down_node(new RBNode<T>(sub_name));
 }}}
  - do we need to do RBNode::swap()?  After swapping, 3 of the six
    member variables of node and down_node are replaced anyway, and we
    could do the same thing with the same number of lines without using
    RBNode::swap():
 {{{
     std::swap(node.data_, down_node->data_);
     down_node->down_ = node.down_;
     node.name_ = base_name;
     node.down_ = down_node.get();
 }}}
  - It seems we could reuse the most of the code logic of nodeFission()
    for the subdomain + current->down==NULL case at the cost of making
    it a bit more complicated.  specifically, we could modify it to:
 {{{
 template <typename T>
 RBNode<T>*
 RBTree<T>::nodeFission(RBNode<T>& node,
                        const isc::dns::Name& down_name,
                        const isc::dns::Name& base_name)
 {
     using namespace helper;
     const isc::dns::Name sub_name = down_name - base_name;
     std::auto_ptr<RBNode<T> > down_node(new RBNode<T>(sub_name));
     if (&node.name_ == &down_name) {
         std::swap(node.data_, down_node->data_);
         node.name_ = base_name;
     }
     down_node->down_ = node.down_;
     node.down_ = down_node.get();
     //root node of sub tree, the initial color is BLACK
     down_node->color_ = BLACK;
     ++node_count_;
     return (down_node.release());
 }
 }}}
   then we can reuse it for the other case as follows:
 {{{
                         RBNode<T>* new_down =
                             nodeFission(*current, name, current->name_);
                         if (new_node != NULL) {
                             *new_node = new_down;
                         }
                         ++name_count_; // note: I think we don't need this
 }}}

 '''dumpTree()'''
  - one of my previous currents still stand:
   - "tree has node XX" doesn't parse well.  I'd rephrase it "tree has
     XX node(s)"

 '''dumpTreeHelper()'''
  - one of my previous currents still stand:
    - at the end of the function, it seems we can avoid the if-else by
      handling null node case at the beginning of the function: [...]

 '''set_get_Data test'''
  - might better be named "setGetData" based on coding guideline.

 '''insertNames test'''
  - the beginning comment is not true any more "a node is
    considered..."  In fact, this test succeeds:
 {{{
     EXPECT_EQ(RBTree<int>::SUCCEED, rbtree.insert(Name("d.e.f"),
 &rbtnode));
     // re-add after "explicitly inserted"
     EXPECT_EQ(RBTree<int>::SUCCEED, rbtree.insert(Name("d.e.f"),
 &rbtnode));
 }}}
   see also a relevant comment about when to return SUCCEED and when
 ALREADYEXIST
  - same note applies to some other comments

-- 
Ticket URL: <http://bind10.isc.org/ticket/397#comment:34>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development