BIND 10 #1177: NSEC support in new data source

Thu Sep 22 22:39:57 UTC 2011

#1177: NSEC support in new data source
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:  major  |  Sprint-20110927
                  Component:  data   |            Resolution:
  source                             |             Sensitive:  0
                   Keywords:         |           Sub-Project:  DNS
            Defect Severity:  N/A    |  Estimated Difficulty:  5
Feature Depending on Ticket:         |           Total Hours:  0
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:11 vorner]:

 > > - I'm not sure if this comment is relevant, but at least NSEC should
 > >   be able to coexist with NS/CNAME (and as commented DS should be okay
 > >   with NS, too):
 >
 > I added the NSEC and NSEC3 (I expect that one is there too). The DS
 needs separate
 > bool variable, as it is not allowed with CNAME.

 Hmm, that's true for NSEC3, but I see some subtle difference for that
 case.  NSEC3 RRs effectively belong to a separate name space of the
 zone, so the rationale why the owner name of an NSEC3 can have other
 types is different from the reason for RRSIG and NSEC.  In that sense
 I'd update the code comment to reflect the subtlety:
 {{{
                 // NSEC and RRSIG can coexist with anything, otherwise
                 // we've seen something that can't live together with
 potential
                 // CNAME or NS
 }}}
 (maybe it's sufficient to add NSEC3 in addition to NSEC though)

 > > - operator+: I personally don't think this definition is intuitive.
 > >   I'd expect an 'operator+' (with two params) to take the same types
 > >   of arguments and to meet other natural expectations derived from the
 > >   arithmetic '+' operator (such as the commutative law).  Also, while
 > >   admittedly this may purely be a matter of taste, I generally agree
 > >   with the concerns documented in the google's guideline:
 > >   http://google-
 styleguide.googlecode.com/svn/trunk/cppguide.xml#Operator_Overloading
 >
 > Actually, I don't agree there at all with the google styleguide. Then
 we're getting to the java world, where you have to write
 string_a.equals(string_b), because it can't overload operators. That is
 much less readable code than with ==.

 Actually, I said "*generally* agree" due to such examples.  My opinion
 is that there are cases where operator overloading contributes to
 better readability, and if the benefit outweighs the pitfalls I'd
 rather prefer using them than dogmatically rejecting them.  So it's
 case by case basis, and I said "generally" because IMO different
 people tend to have different views on clarity or intuitiveness and a
 simple operator can often cause more confusion than clarity.

 Now, for this particular case...

 > As with this concrete operator ‒ adding an element to a set is something
 I did see written this way in mathematics, mainly in graph theory and
 combinatorics (and even substracting an element from set).

 ...I've not seen textbooks in these theoretical areas for more than a
 decade so I don't quite remember the exact notation in such cases.  I
 can see it if the operator + is used exactly as the operation of
 union, i.e., setA + { element_being_added }, and I can imagine that
 the literature may introduce a convenient abbreviation of allowing
 "{}" to be omitted.

 But when I first saw the code, it was just confusing; I first thought
 it was one of the magical class methods of std::map and went to see an
 online manual (unsuccessfully), then I found the private definition in
 database.cc with my natural expectation that it should be a symmetric
 operator, then found it was not, surprised, read the definition more
 closely, then finally understood it.

 Once I followed all these steps, I saw the code was correct, and once
 I understood it, it was not difficult to read the rest of the code.
 But I needed to spend a certain amount of time and brain cycles until
 I reached that point.  In such a case I wouldn't consider the use of
 operator overloading a reasonable abstraction.  Note specifically that
 I had to go to cplusplus.com before knowing this operator is actually
 our own invention.

 > So, to me, the use of the operator for this is much more intuitive than
 + with strings. You have a valid point with the commutative thing, but it
 isn't used the other way around (in the code, nor in mathematics), even if
 it could be, so I just saved myself writing another copy with switched
 order of parameters.
 > Even if the definition might be less intuitive, I think the intention is
 clear where used and probably can't be confused with any other possible
 use of the + operator.

 As I said above, the intention is understandable (not sure if it's
 "clear") once you follow the non trivial steps.  But the question is
 how intuitive and easy to understand the intent is to the majority of
 the readers of the code who tries to understand and possibly update
 it, compared to other, possibly more verbose ways.  Due to the hassles
 I explain, it's not so obvious to me at best.

 Citing another reference as you don't seem to like the google
 guideline, I also "generally" with what is written in the "C++ Coding
 Standards" book on this matter.  If you don't have it, go to
 http://www.amazon.com/exec/obidos/ASIN/0321113586,
 click on "look inside" and search for "Tensor".  "Programmers hate
 surprises" (I was surprised when I first saw the implementation using
 the operator+).  "Programmers expect operators to come in bundles" (I
 did when I first saw the implementation, and I was confused as I found
 the expectation didn't hold).  "Named functions ... therefore should
 be preferred for clearer code if there can be any doubt about
 semantics" (at least I had a doubt about it when I first saw it,
 although I can't speak for many others).

 > > - Considering these, I'd suggest something like this:
 > >   - Define nsec_types etc via a proxy function:
 > > {{{
 > > const set<RRType>&
 > > NSEC_TYPES() {
 > >     static set<RRType>* types;
 > >     if (types == NULL) {
 > >         auto_ptr<set<RRType> > types_ptr(new set<RRType>());
 > >         types_ptr->insert(RRType::NSEC());
 > >         types = types_ptr.release();
 > >     }
 > >     return (*types);
 > > }
 >
 > That's what I tried before introducing the operator and it turned out to
 be a lot of typing and quite a lot of code, which was much less readable
 than simple inline list of types separated by + signs.

 (I just noticed we could actually replace new + auto_ptr with another
 static object, but that probably doesn't make it so different in this
 context)

 I admit this approach looks less stylish and requires longer lines of
 code (which I generally don't like either).  But regarding the
 readability I don't necessarily think it's worse.  If I encounter
 code:
 {{{
                                     found = getRRsets(wildcard,
 NSEC_TYPES(),
                                                       true);
 }}}

 I'd assume NSEC_TYPES would be effectively a constant object just
 idiomatically returned by a proxy function.  So I'd then search for
 the definition of it, and will find it:
 {{{
 // Convenient shortcut for a set of { NSEC }
 const set<RRType>&
 NSEC_TYPES() {
     static set<RRType> types;
     static bool initialized(false);

     if (!initialized) {
         types.insert(RRType::NSEC());
         initialized = true;
     }
     return (types);
 }
 }}}

 it doesn't contradict my expectation, and at least to me pretty easy
 to understand without any surprise, unlike the case with operator+.

 We'll need to do the same thing for DELEGATION_TYPES(), FINAL_TYPES()
 and WILDCARD_TYPES(), and I admit repeating the same pattern is dull,
 but it seems unlikely that we have to continue this practice, so it's
 basically one time effort.  The only other places where we need to
 perform insert() are these:

 {{{
         found = getRRsets(name.toText(), final_types + type, name !=
 origin);
 ...
                     found = getRRsets(wildcard, wildcard_types + type,
 true,
                                       &construct_name);
 }}}

 which would look like

 {{{
         WantedTypes wanted_types(FINAL_TYPES());
         wanted_types.insert(type);
         found = getRRsets(name.toText(), wanted_types, name != origin);
 ...
                     WantedTypes wanted_types(WILDCARD_TYPES());
                     wanted_types.insert(type);
                     found = getRRsets(wildcard, wanted_types, true,
                                       &construct_name);
 }}}

 and while it results in 4 more lines which is not good, I can
 personally accept that.

 > So, I'd rather like to keep the operator there. It's pity we can't
 overload the , operator, because that one would be even better for the
 list. But this is C++, not perl6.

 (Not sure what you mean by we can't - C++ allows overloading
 "operator," but that doesn't matter much in this context anyway) I
 don't claim the proxy function approach is obviously better.  While
 personally I prefer that because it's more straightforward and can
 understand it without a surprise, I also see its drawbacks.  And
 although I less prefer the overloading approach, I don't see a
 technical flaw in it (unlike statically initialized namespace-scope
 objects).  If you still believe the approach with an overloaded
 operator with taking account into all I explained above, especially
 that different people have different senses of clarity and it may not
 be so obvious as the original code author tends to think (to whom it
 tends to be so obvious because it's based on his/her own sense), I
 won't fight against that further.  But I have two final
 suggestions/comments in that case:

 - If possible I'd hope using something different from operator+
   because this one is quite confusing at least to me.  I once thought
   about operator<<, but probably you might not like that either
   because it may look modifying the left operand.
 - Because of the point that clarity/readability is in the eye of the
   beholder, I'd suggest revising this comment:
 {{{
 // This arguably produces more readable code than having bunch of proxy
 // functions and set.insert calls scattered through the code.
 }}}
   I'd simply say, following the previous paragraph, something like
   "this will help keep the lines of code shorter".

 > > - the changes in the first for loop is not a straightforward
 > >   refactoring of the original code, and it's difficult to be sure if
 > >   it really preserves the original behavior.  For example, is the
 > >   behavior same when DNAME exists and the first 'if' conditions are
 met?
 >
 > Well, as I get it, that is the exact reason why we have tests. All the
 previous tests did pass, so the behaviour we are interested in should be
 the same. And even when I decided the original code was too complicated
 and I just rewrote this bit and it potentially acted differently in some
 situation than the original, if this one looks correct (and it does to
 me), it should be OK, because we did not try the original in anything but
 the unittests anyway.
 >
 > So, is there a reason to stick to the exact original code?

 In general, I don't necessarily oppose to piggy-backing improvements
 (changing the behavior) as part of "refactoring".  In fact I often do
 it myself.  I see two issues in this particular context, however.
 First, tests only provide necessary checks, not sufficient ones.  The
 fact that the new code doesn't break existing tests means it doesn't
 break the original behavior only covered by the tests.  And, as I
 noted in the subsequent review comments, there were in fact uncovered
 part of the code.  Second, I wouldn't be too much worry about
 piggy-backing if the original code were reasonably simple and easy to
 understand as I could be quite confident that the behavior change is
 safe.  In the case of this code, however, as we both seem to have
 agreed the code is quite complicated, and I cannot be sure that the
 behavior change doesn't have other adverse effects I'm missing.

 So, in this specific case, I'd be much happier if we could make it
 more gradually, i.e., fast refactor the code in its strictness sense
 without introducing behavior changes, then improve the behavior based
 on the (hopefully) now more understandable code.

 That said, rewinding everything at this point may not be an effective
 way of development.  If you can list up things you changed with why,
 and if it's reasonable, that's probably sufficient.

-- 
Ticket URL: <http://bind10.isc.org/ticket/1177#comment:13>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development