BIND 10 #1688: duplicate RR suprression in auth::Query

Thu Mar 22 07:14:25 UTC 2012

#1688: duplicate RR suprression in auth::Query
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20120403
  medium                             |            Resolution:
                  Component:         |             Sensitive:  0
  b10-auth                           |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  5
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:12 stephen]:
 > > Is it really a better choice to use a std::set? While it'd be more
 efficient for the search itself, it needs to allocate new resource every
 time we insert an entry (and we need to release the resource on
 destruction).
 > That's a fair point.  I've altered the RRsetInserter code to use a (pre-
 allocated) vector instead of a set and this is searched sequentially (so
 O(n^2^) comparisons).  To avoid constant construction and destruction,
 that is a member of the Query class and is reset prior to each query.
 This change addresses all the other points.

 The basic idea looks okay.  I originally meant performing duplicate
 check on the answers_/authorities_/additionals_ vectors directly
 rather than introducing a separate vector.  But after experimenting
 this approach I didn't see much difference in performance (and in some
 cases the current branch version is faster), so, in the end, I'm okay
 with that.

 But, I have some non trivial comments on the latest version.  As
 you'll be off for development for a while, I took over it and
 addressed my own comments.  I'll ask someone else for the changes I
 made.

 I've also checked performance implication.  While the test cases are
 limited, the revised version is actually generally faster than the
 previous one, using std::set, despite the O(n^2) search.  The test
 case includes a relatively "large scale" case of a delegation from
 root to com (without DNSSEC), which involves 15 RRsets.

 Now, my comments follow:

 '''Major point'''

 - I'd like to avoid exposing member variables directly, even as
   protected, and even if it's intended only for tests.  It's so
   especially when they are mutable.  I've self-addressed this pint at
   4ad96ab.  As a result, the `Inserter` class was renamed to
   `ResponseCreator` and publicly visible (for the test).

 '''Relatively minor points'''

 - I think we need a changelog for this change.  This is my proposed
   entry:
 {{{
 407.?   [bug]           stephen, jinmei
         b10-auth didn't remove duplicate RRsets from responses.
         (Trac #1688, git TBD)
 }}}

 - I'd suggest using vector::insert instead of copy:
 {{{#!c++
 #if 0
     copy(rrset_vector.begin() + 2, rrset_vector.begin() + 4,
          back_inserter(answer));
 #endif
     answer.insert(answer.end(), rrset_vector.begin() + 2,
                   rrset_vector.begin() + 4);
 }}}
   My understanding is that it's generally advisable because it's a
   vector's "native" method.  See also the book "Effective STL" if you
   have it.  I've self-addressed this at fd5a4de.

 - Also, I guess it's safer to use rrset_vector.end() instead of '+ 9'
 {{{#!c++
     copy(rrset_vector.begin() + 7, rrset_vector.begin() + 9,
          back_inserter(additional));
 }}}
   because '+9' would exceeds the valid range.  I've not confirmed what
   the STL standard says on this point, and in event, it wouldn't cause
   a disruption in the case of vector iterators (in many cases they are
   just a plain old pointer), but at least using end() shouldn't do any
   harm.  I self-addresses these at fd5a4de.

 - I'd suggest testing the case where duplicate happens in the same
   section (this can happen in practice), and I self-addressed this
   at 847525d.

 - I'd suggest using BOOST_FOREACH instead of code like this:
 {{{#!c++
     for (i = answers.begin(); i != answers.end(); ++i) {
         addRRset(response, Message::SECTION_ANSWER, *i, dnssec);
     }
 }}}
   I self-addressed this point at a6d9e2b.  See the commit log.

 - We should now be able to skip the following duplicate checks:
 {{{#!c++
     // Add the (no-) wildcard proof only when it's different from the NSEC
     // that proves NXDOMAIN; sometimes they can be the same.
     // Note: name comparison is relatively expensive.  When we are at the
     // stage of performance optimization, we should consider optimizing
 this
     // for some optimized data source implementations.
     if (nsec->getName() != fcontext->rrset->getName()) {
         authorities_.push_back(fcontext->rrset);
     }
 ...
     if (nsec->getName() != fcontext->rrset->getName()) {
         // one NSEC RR proves wildcard_nxrrset that no matched QNAME.
         authorities_.push_back(fcontext->rrset);
     }
 }}}
   I've addressed this at e70bd3b.

 - I can now re-enable the nsec3 lettuce scenarios that were disabled
   due to the missing suppression of duplicate RRsets.  I did it at
   6f8e187.

-- 
Ticket URL: <http://bind10.isc.org/ticket/1688#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development