BIND 10 #2218: update InMemoryZoneFinder::findNSEC3 using memory-efficient version

Tue Sep 25 07:27:00 UTC 2012

#2218: update InMemoryZoneFinder::findNSEC3 using memory-efficient version
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20120925
  medium                             |            Resolution:
                  Component:  data   |             Sensitive:  0
  source                             |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  5
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
  scalable inmemory                  |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by muks):

 * owner:  muks => jinmei

Comment:

 Hi Jinmei

 Replying to [comment:11 jinmei]:
 [snip]
 > First, regarding `successor()`: at least your revised code doesn't use
 > `successor()`, so it can (and I would say should) be private again.
 > Next, as for `predecessor()`: you almost reimplemented the logic that
 > is already available `DomainTree::previousNode` as a result of using
 > `predecessor()`.  Apart from whether it makes sense to expose
 > `predecessor()`, the resulting code is duplicate and IMO much more
 > unreadable.  The whole re-invented logic can be much more simplified
 > if you use previousNode:
 >
 > {{{#!cpp
 >         } else {
 >             const ZoneNode* last_node = chain.getLastComparedNode();
 >             while ((covering_node = tree.previousNode(chain)) != NULL &&
 >                    covering_node->isEmpty()) {
 >                 ;
 >             }
 >             if (covering_node == NULL) {
 >                 covering_node = last_node->getLargestInSubTree();
 >             }
 >
 >             if (!recursive) {   // in non recursive mode, we are done.
 > ...
 > }}}
 >
 > If you agree that this one is simpler and more readable, then we also
 > don't need to make `predecessor()` public either, and I'd suggest
 > keeping it private.

 [snip]

 > > I don't follow what you mean by exploit here. Do you mean not have the
 tree-of-tree structure? `TreeNodeRRset` gets its name directly from the
 node in the tree-of-trees.
 >
 > What I meant (re-reading it now, I admit the original comment wasn't
 > clear) is that I'd rather define and use something like
 > `TreeNode* DomainTree::largestNode()`, which returns the largest node
 > in the tree of tree.  In the case of the NSEC3 tree, this should be
 > the same as "largest in subtree" (in the subtree of NSEC3 hash names).
 >
 > I admit the encapsulation of `DomainTree` isn't really clean, and we
 > sometimes allow the applications to use the knowledge of its internal
 > structure.  But when possible, I'd like to hide such details as it's a
 > "tree of tree" as much as possible, and have applications use
 > `DomainTree` just like some kind of abstract container.  The concept
 > of "getting the largest entry" can exist for any type of container
 > that has the notion of ordering, but `getLargestInSubTree` looks too
 > much rely on specific internal details of it.  It might be less likely
 > that we'll fundamentally revise the internal details (e.g., like
 > changing tree to hash), but if we have choices for some purpose that
 > have comparable costs, I'd like to choose the one that can hide
 > internal details more.

 I follow what you're trying to do, i.e., keep `DomainTree` encapsulated as
 much as possible. In the first case, I used `predecessor()` because
 `previousNode()` can exit the sub-tree and go to the upper level. I've
 changed it so that it uses the code above now.

 Replying to [comment:12 jinmei]:
 > Replying to [comment:10 muks]:
 >
 > > > - The original code takes care of the case where there's no NSEC3
 (but
 > > >   there maybe NSEC3PARAM).  Shouldn't we do the same?
 > >
 > > We do in the loader. I am not following where in the original code
 that you are referring to.
 >
 > I meant this by "original":
 > {{{#!cpp
 >     const NSEC3Map& map = impl_->zone_data_->nsec3_data_->map_;
 >     if (map.empty()) {
 >         isc_throw(DataSourceError,
 >                   "findNSEC3 attempt but zone has no NSEC3 RR: " <<
 >                   impl_->origin_ << "/" << impl_->zone_class_);
 >     }
 > }}}
 > and, as far as I can see, loader doesn't ensure we can skip the same
 > check.  See relevant comments for the tests below.

 Such a check was added after the last review (to see if the tree is
 empty).

 >
 > Rest of my code comments follow::
 >
 > '''domaintree.h'''
 >
 > - I don't see the need for `getSubTreeRoot()`.  Why was this added?
 >   (I've not reviewed corresponding tests as I was not sure if we
 >   really need this method).

 The `getSubTreeRoot()` was a utility used by both `getUpperNode()` and
 `getLargestInSubTree()`. I've made these functions private now.

 > - Why did you change the type of node_count_ to size_t?  I suspect
 >   sizeof(size_t) can often be 8 for 64-bit machines, and with the
 >   existence of `needsReturnEmptyNode_` it would make the
 >   `sizeof(DomainTree)` unnecessarily large.  It can be a waste if we
 >   support millions of zones.  I admit 'unsigned int' might not be a
 >   good choice either, though.  Maybe the best one is uint32_t, and
 >   explain why we specifically choose it.

 This was changed because `size_t` is the appropriate type for memory sizes
 and memory-sized counters (like number of items of something, unless there
 is known to be a smaller hard-limit). Before, there was a mix of `int` and
 `unsigned int`. I've changed it to `uint32_t` now.

 >
 > '''memory_client.h/cc'''
 >
 > - maybe not really related to this branch, but I suspect we shouldn't
 >   make findZone2() a const member function, because it effectively
 >   returns a mutable internal state.  Also, it would better to be not
 >   virtual (so it won't be misleading).  I'd probably choose a
 >   different name.  And, if it's mainly for tests I'd note that in the
 >   documentation (if it's expected to be used for general use, I think
 >   we should reconsider it - returning internal zone data seems to be a
 >   risky interface).

 I've renamed it to `findZoneData()` and made it non-virtual, non-const,
 and added a comment about its use in testing.

 > - This now-removed comment makes me nervous
 > {{{#!cpp
 > -    // This variant of findZone() is not implemented and should be
 > -    // removed eventually. It currently throws an exception. It is
 > -    // required right now to derive from DataSourceClient.
 > -    virtual isc::datasrc::DataSourceClient::FindResult
 > -    findZone(const isc::dns::Name& name) const;
 > }}}
 >   It said "should be removed eventually", but this branch did the exact
 >   opposite.  What happened?

 That comment was added by me in #2108 as I thought we'd be going towards a
 different interface for `findZone()`. But we kept the same interface, so
 the comment was removed.

 > '''treenode_rrset'''
 >
 > - I don't see why we need to implement getRRsig() in this branch.  But
 >   if you wanted to implement it, please be responsible for providing
 >   tests.  The documentation needed to be updated too. (But I did both
 >   in #2219, so you don't have to address these in this branch).

 It was used in the tests inside `rrsetsCheck()`. The `TreeNodeRRset` tests
 were missed, but this needed to work for the zone finder tests to pass.
 I'll skip adding tests in this branch as #2219 has them and we'll get them
 when these merge. Same for adding `getTTL()` too, but that was indeed
 tested.

 >
 > '''zone_finder_unittest.cc'''
 >
 > - I'd avoid initializing nsec3_hash_map in the namespace level.  I've
 >   committed my suggested change to address this (2fdacd8).

 This whole thing is gone now, as `NSEC3Hash` is now used and
 `faked_nsec3.cc` is mostly untouched except addition of some data.

 > - should this be removed?
 > {{{#!cpp
 >     // This needs to be updated and checked when implementing #2118
 > }}}

 Gone now.

 > - this is invalid if salt_data_2 is empty:
 > {{{#!cpp
 >                  (std::memcmp(&salt_data_2[0], salt_data, salt_len) !=
 0)) {
 > }}}

 These are tested for length now.

 > - I don't understand this comment:
 > {{{#!cpp
 >         // We assume that rrsig has already been checked to match rrset
 >         // by the caller.
 > }}}

 It was an error in pasting some `RdataSet::create()` code from elsewhere.
 Removed.

 > - I don't think this test really does what it claims:
 > {{{#!cpp
 >     // Only having NSEC3PARAM isn't enough
 >     addZoneData(textToRRset("example.org. 300 IN NSEC3PARAM "
 >                             "1 0 12 aabbccdd"));
 > }}}
 >   because in the actual implementation, "only having NSEC3PARAM"
 >   enables `NSEC3Data`.  So findNSEC3 actually wouldn't throw.

 I have checked this. It indeed throws in the tree node count test at the
 top of `findNSEC3()`. If that test is removed, it asserts at the bottom of
 that method.

 > - the comment doesn't really seem to be 100% accurate.  It's indeed
 >   for `InMemoryZoneFinder`, but in this context it's NSEC3 specific.
 > {{{#!cpp
 > /// \brief Test fixture for the InMemoryZoneFinder class
 > class InMemoryZoneFinderNSEC3Test : public InMemoryZoneFinderTest {
 > }}}

 Updated.

 > - can you add some comments on findNSEC3Walk and its the data?  What's
 >   the point of the test, meaning of `TestData` member variables, how
 >   the hash values are chosen (whether they are randomly chosen or
 >   intended to test some specific case - I suspect the latter, but it's
 >   not immediately obvious from the definitions).

 Added.

 >
 > '''nsec3hash'''
 >
 > Frankly, I don't like this hack.  It breaks some design principles of
 > the class (e.g., reusing internal resources for multiple hash
 > calculations - for that matter, this branch doesn't update the
 > documentation on this).  Also, defining `NSEC3Hash::calculate` for
 > SHA1 only seems ad hoc.
 >
 > To me, a better approach is to extend the factory interface so that we
 > can create an `NSEC3Hash` instance from the algorithm, iterations and
 > salt.
 >
 > the in-memory implementation would have a function pointer to a
 > specific factory instance for the normal and test cases, not the
 > calculation function itself.

 This hack was removed and replaced with another `NSEC3Hash::create()`
 variant which is much cleaner. I wish I had thought of doing this before.

 The in-memory implementation does not keep any function pointers now, as
 the code in `NSEC3Hash` itself has a way to provide fake hash calculation.

 >
 > '''zone_finder.h'''
 >
 > nsec3_calculate_: In general, I don't like to introduce a public or
 > protected member variable, especially if it's mutable, as it can
 > easily break class integrity due to a careless application or derived
 > class implementation.  In this case, if there's no better way to allow
 > tests to use a faked calculator, please at least leave comment that
 > this protected member variable is only intended for test purposes
 > (isn't it?), and other derived class shouldn't try to tweak it.  Also,
 > I suspect the in-memory finder class isn't really expected to be
 > further derived except for testing.  If that's the case, please also
 > document that.

 This code is gone now due to the use of `NSEC3Hash`.

 >
 > '''zone_finder.cc'''
 >
 > In addition to other points that were already discussed, I've noticed
 > the same patter like this repeated:
 > {{{#!cpp
 >             const RdataSet* rdataset = node->getData();
 >             assert(rdataset != NULL);
 >             assert(rdataset->type == RRType::NSEC3());
 >
 >             ConstRRsetPtr closest = createTreeNodeRRset(node, rdataset,
 >                                                         getClass());
 > }}}
 > and it makes the method longer and less readable.  I'd introduce a
 > helper function to unify the logic:
 > {{{#!cpp
 > ConstRRsetPtr
 > createNSEC3RRset(const ZoneNode* node, const RRClass& rrclass) {
 >     const RdataSet* rdataset = node->getData();
 >     // add comment why we can assert these
 >     assert(rdataset != NULL);
 >     assert(rdataset->type == RRType::NSEC3());
 >
 >     return (createTreeNodeRRset(node, rdataset, rrclass));
 > }
 > }}}
 > and replace the above with this:
 > {{{#!cpp
 >             ConstRRsetPtr closest = createNSEC3RRset(node, getClass());
 > }}}
 >

 Done. :)

-- 
Ticket URL: <http://bind10.isc.org/ticket/2218#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development