BIND 10 #2218: update InMemoryZoneFinder::findNSEC3 using memory-efficient version

Fri Sep 21 11:27:27 UTC 2012

#2218: update InMemoryZoneFinder::findNSEC3 using memory-efficient version
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20120925
  medium                             |            Resolution:
                  Component:  data   |             Sensitive:  0
  source                             |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  5
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
  scalable inmemory                  |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by muks):

 * owner:  muks => jinmei

Comment:

 Hi Jinmei

 Replying to [comment:8 jinmei]:
 > I've not completed full review, but found several substantial issues.
 > So I'm dumping the current review comments and giving ticket back to
 > mukund.  Please address (or discuss to resolve) these points first.
 >
 > '''general'''
 >
 > - According to the commit sequence, this branch doesn't seem to be
 >   developed "test driven".  It's not what we should do.

 Sorry about that. In many cases I was working on the problem at hand
 (which used the method being added), and had to come back to properly add
 testcases.

 > - zone_finder_unittest and faked_nsec3 test have a lot of duplicates.
 >   Please unify them.

 Done.

 >
 > '''memory_client.cc'''
 >
 > - In addNSEC3(), I believe we can (now) simply add the owner name of
 >   the NSEC3 RRset:
 > {{{#!cpp
 >         ZoneNode* node;
 >         nsec3_data->insertName(mem_sgmt_, rrset->getName(), &node);
 > }}}
 >   In fact, all tests pass with this version (and if this isn't
 >   supposed to work, it means we don't have sufficient tests).

 You are right. As we use `ZoneTree::find()` now, there's no need to change
 the first label to uppercase. Many of the review issues are because this
 was a direct translation of the map based `findNSEC3()`, but I've updated
 it now.

 > - I don't like to have something like iterateSHA1 in the zone_finder
 >   code.  The data source code is basically supposed to use a higher
 >   level abstraction, the `NSEC3Hash` class.  The current code doesn't
 >   only rely on the low level concept, but also rely on in-house SHA1
 >   implementation, which we'd probably replace with Botan.

 These have been moved to the `NSEC3Hash` class. I can't think of a better
 way to do it than have a static function there as we deal with raw data.

 > - The original code takes care of the case where there's no NSEC3 (but
 >   there maybe NSEC3PARAM).  Shouldn't we do the same?

 We do in the loader. I am not following where in the original code that
 you are referring to.

 > - maybe a matter of taste, but this expression doesn't look very
 >   readable (very long and with ?:):
 > {{{#!cpp
 >         const std::string hlabel = (nsec3_calculate_)(
 >             ((labels == qlabels ?
 >               name : name.split(qlabels - labels, labels)),
 >              nsec3_data->iterations,
 >              nsec3_data->getSaltData(),
 >              nsec3_data->getSaltLen());
 > }}}
 >   I'd introduce a variable to the main expression concise:
 > {{{#!cpp
 >         const Name& hname = (labels == qlabels ?
 >                              name : name.split(qlabels - labels,
 labels));
 >         const std::string hlabel = (nsec3_calculate_)(
 >             hname, nsec3_data->iterations, nsec3_data->getSaltData(),
 >             nsec3_data->getSaltLen());
 > }}}

 Updated.

 > - also maybe a matter of taste, but maybe it's slightly more
 >   lightweight if define `chain` outside the for loop and clear it:
 > {{{#!cpp
 >     ZoneChain chain;
 >     for (unsigned int labels = qlabels; labels >= olabels; --labels) {
 > ...
 >         //ZoneChain chain; <= instead of this
 >         chain.clear();
 > }}}

 Updated.

 > - I'd move the definition of `tree` outside of the for loop:
 > {{{#!cpp
 >     const ZoneTree& tree = nsec3_data->getNSEC3Tree();
 >     for (unsigned int labels = qlabels; labels >= olabels; --labels) {
 > ...
 >         //const ZoneTree& tree = nsec3_data->getNSEC3Tree();
 > }}}
 >

 Updated.

 > - this seems to be unnecessarily expensive:
 > {{{#!cpp
 >         const ZoneTree::Result result =
 >             tree.find(Name(hlabel + "." + getOrigin().toText()), &node,
 chain);
 > }}}
 >   due to the mix of toText(), string concatenation and name
 >   construction from string.  I'd do it:
 > {{{#!cpp
 >         const ZoneTree::Result result =
 >             tree.find(Name(hlabel).concatenate(getOrigin()), &node,
 chain);
 > }}}

 Updated.

 > - I'd rename `set` to `rdataset` or `rdset`.  `set` sounds too
 >   generic.

 Done.

 > - couldn't the construction of `next` be simpler, like this?
 > {{{#!cpp
 >             ConstRRsetPtr next = (covering_node == NULL) ?
 ConstRRsetPtr() :
 >                 createTreeNodeRRset(covering_node,
 covering_node->getData(),
 >                                     getClass());
 > }}}

 Updated.

 > - this implementation assumes `ZoneNode->getData()` is non NULL and is
 >   the NSEC3 RRset.  That should be correct, but could be fragile as it
 >   heavily relies on how the NSEC3 is constructed.  I'd be a bit more
 >   proactive, e.g. by asserting the type is NSEC3.

 Added.

 > - these comments don't match the actual code any more.  please update:
 > {{{#!cpp
 >             // If the given hash is larger than the largest stored hash
 or
 >             // the first label doesn't match the target, identify the
 >             // "previous" hash value and remember it as the candidate
 next
 >             // closer proof.
 > ...
 >                 // Otherwise, H(found_entry-1) < given_hash <
 H(found_entry).
 >                 // The covering proof is the first one (and it's valid
 >                 // because found is neither begin nor end)
 > }}}
 >   This type of mismatch seems to be commonly observed in your
 >   branches.  If this is something you often forget to check, please
 >   include in your personal check list for future tasks if that is
 >   mainly updating existing code.
 >
 > - the non-exact match case seems to have many issues.
 >
 >   - first off, it doesn't seem correct.  Consider a name whose hash is
 >     02UDEMVP1J2F7EG6JEBPS17VP3N8I58H.  If we do non recursive search
 >     for this name, the covering hash should be
 >     01UDEMVP1J2F7EG6JEBPS17VP3N8I58H, but this implementation actually
 >     returns some bogus (not just incorrect) result.
 >   - Even if this code were accurate, it seems to be unnecessarily
 >     expensive.  It always compute predecessor() and successor(), which
 >     are not super cheap.  In the vast majority cases we'd have a non
 >     NULL previous node, and we should simply be able to use it in this
 >     case.

 The code was indeed buggy and has been rewritten. A tree walk test has
 also been added which looks for hashes on both sides of each node, and
 also existing nodes (exact matches), both recursive and non-recursive. The
 old comments are also gone.

 >   - secondly, I don't like to expose too much of `DomainTreeNode`
 >     internals.  Specifically, I don't like to make predecessor() and
 >     successor() just for this purpose.  Likewise, I don't like to
 >     introduce things like getLargestInSubTree() as a public method of
 >     `DomainTreeNode`.  In general, I prefer extending the node chain
 >     interfaces for operations that require moving inside the tree.

 For this case, I want to disagree. `DomainTree` is more or less a BST with
 walking capability. `predecessor()` and `successor()` are basic operations
 on such a data structure, and if there is the use-case, I don't think we
 should force to keep these hidden. The added methods are operations on
 nodes, and are basic operations. I feel adding them in the node chain
 would not be ideal. I thought of adding a method like
 `getNSEC3CoverNode()` to the node chain, but such things seem to be
 misplaced in the `DomainTree`.

 >     Also, due to the special structure of the NSEC3 tree, "in sub
 >     tree" property is not really necessary.  While exploiting this
 >     fact itself may be controversial, I personally like to exploit it
 >     than tweaking the node class (and this code already exploits some
 >     of the special nature of the NSEC3 anyway).

 I don't follow what you mean by exploit here. Do you mean not have the
 tree-of-tree structure? `TreeNodeRRset` gets its name directly from the
 node in the tree-of-trees.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2218#comment:10>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development