BIND 10 #2218: update InMemoryZoneFinder::findNSEC3 using memory-efficient version

Thu Sep 20 06:31:29 UTC 2012

#2218: update InMemoryZoneFinder::findNSEC3 using memory-efficient version
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20120925
  medium                             |            Resolution:
                  Component:  data   |             Sensitive:  0
  source                             |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  5
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
  scalable inmemory                  |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 I've not completed full review, but found several substantial issues.
 So I'm dumping the current review comments and giving ticket back to
 mukund.  Please address (or discuss to resolve) these points first.

 '''general'''

 - According to the commit sequence, this branch doesn't seem to be
   developed "test driven".  It's not what we should do.
 - zone_finder_unittest and faked_nsec3 test have a lot of duplicates.
   Please unify them.

 '''memory_client.cc'''

 - In addNSEC3(), I believe we can (now) simply add the owner name of
   the NSEC3 RRset:
 {{{#!cpp
         ZoneNode* node;
         nsec3_data->insertName(mem_sgmt_, rrset->getName(), &node);
 }}}
   In fact, all tests pass with this version (and if this isn't
   supposed to work, it means we don't have sufficient tests).

 - I don't like to have something like iterateSHA1 in the zone_finder
   code.  The data source code is basically supposed to use a higher
   level abstraction, the `NSEC3Hash` class.  The current code doesn't
   only rely on the low level concept, but also rely on in-house SHA1
   implementation, which we'd probably replace with Botan.
 - The original code takes care of the case where there's no NSEC3 (but
   there maybe NSEC3PARAM).  Shouldn't we do the same?
 - maybe a matter of taste, but this expression doesn't look very
   readable (very long and with ?:):
 {{{#!cpp
         const std::string hlabel = (nsec3_calculate_)(
             ((labels == qlabels ?
               name : name.split(qlabels - labels, labels)),
              nsec3_data->iterations,
              nsec3_data->getSaltData(),
              nsec3_data->getSaltLen());
 }}}
   I'd introduce a variable to the main expression concise:
 {{{#!cpp
         const Name& hname = (labels == qlabels ?
                              name : name.split(qlabels - labels, labels));
         const std::string hlabel = (nsec3_calculate_)(
             hname, nsec3_data->iterations, nsec3_data->getSaltData(),
             nsec3_data->getSaltLen());
 }}}

 - also maybe a matter of taste, but maybe it's slightly more
   lightweight if define `chain` outside the for loop and clear it:
 {{{#!cpp
     ZoneChain chain;
     for (unsigned int labels = qlabels; labels >= olabels; --labels) {
 ...
         //ZoneChain chain; <= instead of this
         chain.clear();
 }}}

 - I'd move the definition of `tree` outside of the for loop:
 {{{#!cpp
     const ZoneTree& tree = nsec3_data->getNSEC3Tree();
     for (unsigned int labels = qlabels; labels >= olabels; --labels) {
 ...
         //const ZoneTree& tree = nsec3_data->getNSEC3Tree();
 }}}

 - this seems to be unnecessarily expensive:
 {{{#!cpp
         const ZoneTree::Result result =
             tree.find(Name(hlabel + "." + getOrigin().toText()), &node,
 chain);
 }}}
   due to the mix of toText(), string concatenation and name
   construction from string.  I'd do it:
 {{{#!cpp
         const ZoneTree::Result result =
             tree.find(Name(hlabel).concatenate(getOrigin()), &node,
 chain);
 }}}

 - I'd rename `set` to `rdataset` or `rdset`.  `set` sounds too
   generic.
 - couldn't the construction of `next` be simpler, like this?
 {{{#!cpp
             ConstRRsetPtr next = (covering_node == NULL) ? ConstRRsetPtr()
 :
                 createTreeNodeRRset(covering_node,
 covering_node->getData(),
                                     getClass());
 }}}
 - this implementation assumes `ZoneNode->getData()` is non NULL and is
   the NSEC3 RRset.  That should be correct, but could be fragile as it
   heavily relies on how the NSEC3 is constructed.  I'd be a bit more
   proactive, e.g. by asserting the type is NSEC3.

 - these comments don't match the actual code any more.  please update:
 {{{#!cpp
             // If the given hash is larger than the largest stored hash or
             // the first label doesn't match the target, identify the
             // "previous" hash value and remember it as the candidate next
             // closer proof.
 ...
                 // Otherwise, H(found_entry-1) < given_hash <
 H(found_entry).
                 // The covering proof is the first one (and it's valid
                 // because found is neither begin nor end)
 }}}
   This type of mismatch seems to be commonly observed in your
   branches.  If this is something you often forget to check, please
   include in your personal check list for future tasks if that is
   mainly updating existing code.

 - the non-exact match case seems to have many issues.

   - first off, it doesn't seem correct.  Consider a name whose hash is
     02UDEMVP1J2F7EG6JEBPS17VP3N8I58H.  If we do non recursive search
     for this name, the covering hash should be
     01UDEMVP1J2F7EG6JEBPS17VP3N8I58H, but this implementation actually
     returns some bogus (not just incorrect) result.
   - secondly, I don't like to expose too much of `DomainTreeNode`
     internals.  Specifically, I don't like to make predecessor() and
     successor() just for this purpose.  Likewise, I don't like to
     introduce things like getLargestInSubTree() as a public method of
     `DomainTreeNode`.  In general, I prefer extending the node chain
     interfaces for operations that require moving inside the tree.
     Also, due to the special structure of the NSEC3 tree, "in sub
     tree" property is not really necessary.  While exploiting this
     fact itself may be controversial, I personally like to exploit it
     than tweaking the node class (and this code already exploits some
     of the special nature of the NSEC3 anyway).
   - Even if this code were accurate, it seems to be unnecessarily
     expensive.  It always compute predecessor() and successor(), which
     are not super cheap.  In the vast majority cases we'd have a non
     NULL previous node, and we should simply be able to use it in this
     case.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2218#comment:8>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development