BIND 10 #2096: Define and implement "RdataReader" class

Thu Aug 23 14:57:52 UTC 2012

#2096: Define and implement "RdataReader" class
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20120904
  medium                             |            Resolution:
                  Component:  data   |             Sensitive:  0
  source                             |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  5
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
  scalable inmemory                  |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Thank you for the thorough review.

 Replying to [comment:15 jinmei]:
 > '''substantial design issue'''
 >
 > If I understand the implementation correctly, it lacks one important
 > functionality: ability to specify RDATA boundaries.  Without this, we
 > cannot use this class to implement RRset::toWire.

 I didn't really know they were needed. But I added them. The next now
 returns information if it's at the end of a Rdata. Also, convenience
 methods to iterate over single signature and single rdata were added
 around it.

 > The "next" version is also less efficient.  It's inherently so,
 > because we need to examine the type within the class implementation
 > and its application.  Also, at least in the current implementation the
 > construction overhead of the `Result` object is not negligible (for
 > example, it involves the construction of `LabelSequence`, which,
 > although designed to be lightweight, still requires some amount of
 > work).  This happens not only for usual data/name, but also for
 > indicating the end of data.  The suggested usage using the type
 > conversion operator makes it worse:
 > {{{#!cpp
 > /// RdataReader::Result data;
 > /// while (data = reader.next()) {
 > ///     switch (data.type()) {
 > }}}
 > this involves a call to the default constructor, and then to the
 > assignment operator.
 >
 > To confirm this concern, I've written a micro benchmark test and
 > committed it to the branch.  In my environment (with the hardcoded
 > test scenario), the "next" version is more than 10% slower than the
 > "iterator" version.  If we try to avoid the overhead of duplicate
 > construction like the quoted part above, the "next" version was
 > a bit faster, but it's still slower than 5-8% than the "iterator"
 > version.  (I guess the benchmark program will be useful even after
 > completing this branch, so we should probably keep it with necessary
 > cleanup).

 I'm really surprised about the 10%. Did you use a function, or a
 functor/boost::bind object?

 Also, I didn't find the benchmark in the branch. Did you push it?

 Anyway, I switched to the callback only.

 > I admit the indirect usage via the callback is a drawback of the
 > "iterator" version, but after seeing code examples in the unit tests
 > and the benchmark program, the "next" version with a loop and case
 > analysis doesn't really look so readable either (although you may
 > simply disagree on this).

 If C++ could do lambda functions in some reasonable way (and, no, functors
 are not reasonable), I wouldn't hesitate a moment and used the callback
 version. But if you look at the code in the tests, especially the ones
 where I need to switch the renderer in the middle of the use of a reader,
 I'm still not sure this is readable.

 > - I'm not really sure about the  meaning of 'private':
 > {{{#!cpp
 > /// This header should be considered private to the implementation and
 should
 > /// not be used from outside.
 > }}}

 I meant people should not include it directly. Another option would be
 placing both the reader and encoder in the same file and hide it
 completely. Should I do it?

 >   if it means it's private to `RdataEncoder` and `RdataReader` only, we
 should
 >   at least extract `RdataNameAttributes` to somewhere else because
 >   it's expected to be used by other classes via the reader's name
 >   action.  Or, if it's about naming (rdata_encoder.h looks like
 >   encoder specific), maybe we should introduce more generic name like
 >   "rdata_serialization.h/cc" and define related things there?
 >   (and still hide implementation details in .cc).  On a related point,
 >   we might want to rename `RdataEncoder` to `RdataSerializer`?

 The attributes are available, as the header is included indirectly. Is
 that a problem?

 Also, what is the advantage of Serializer over Encoder?

 > - what does "it's not unfinished" mean?
 > {{{#!cpp
 >     // Do nothing here. On purpose, it's not unfinished.
 > }}}

 Well, if I see an empty function, I stop and think if the function really
 should be empty or someone left it for later time and forgot about it. I
 tried to explicitly clarify it is empty on purpose.

 [ Bunch of Result related stuff that is irrelevant now ]

 > - `RdataReader` constructor: as suggested above, I'd change the type
 >   of 'data' to `const void*`.  Then, I'd use reinterpret_cast for
 >   lengths_:
 > {{{#!cpp
 >     lengths_(reinterpret_cast<const uint16_t*>(data)),
 > }}}
 >   In this case, we are in fact doing dangerous conversion from
 >   an arbitrary pointer to uint16_t* (when it's used in `RdataSet`, we
 >   use the region immediately after the object for the encoded data
 >   by reinterpreting `RdataSet*` to, e.g., uint16_t*).  The caller may
 >   even be careless about data alignment and the `data` pointer can
 >   indeed be bogus as a uint16_t pointer depending on the architecture.
 >   We know all this, and still deliberately decided to use the risky
 >   conversion trading the safety with efficiency.  For such cases, IMO
 >   we should be honest about what we are doing, and by doing so, we can
 >   indicate readers of the code that it should be handled with an extra
 >   care.

 Changed, but I'm not sure about the reinterpret_cast. I believe that one
 is one step more dangerous than we do. While we look at the data pointed
 to as a different type, the reinterpret_cast looks at the pointer itself
 as a different type.

 > - Same for `data_`.

 You mean the conversion, or that `data_` itself should be void*? The
 second would be very inconvenient, since we need to move the offset in it.

 > - It's not really clear to me why we need to keep
 >   `ManualDecoderStyle`.  Is there anything essentially missing that
 >   can only be tested here?

 Strictly speaking, we don't. But I thought that if the code was already
 written, it's pity to remove it, because it tests something more. The
 reader tests check that the same data as went in get out too. That is the
 important part. The manual decoder also checks the encoded form looks as
 documented ‒ which is not important, but may be handy, if the other tests
 start to fail, we can know if it is because the encoder or the reader.

 Replying to [comment:17 jinmei]:
 > - rdata_reader.h: this file can (possibly) be included outside of this
 >   directly (without explicit "-I"), rdata_field.h should be specified
 >   as a relative path:
 > {{{#!cpp
 > //#include "rdata_field.h"
 > #include <datasrc/memory/rdata_field.h>
 > }}}

 In fact, that is the exact reason why the quoted version is correct. The
 pointed bracket version uses the include path to search for first matching
 file and takes the `-I` switches into account. This is potentially
 dangerous, because there may be a matching file sooner in the includes.
 The quoted version is always relative to the file where the `#include`
 statement is written and -I are used when it is not found in the correct
 directory. In this case, this is relative to the `rdata_reader.h` file, no
 matter where that one is included from.

 I believe the dislike for the correct quoted version in ISC is due to some
 long forgotten bug in some very ancient compiler that did it wrong and
 either included relatively to the .c file being compiled or used -I paths
 for the quoted ones too. It looks like similar relict as comparing `x$var
 = x` in shell instead of `"$var" = ""`. But if anybody sees such compiler,
 please, let me know, so I know who to blame for so much damage to proper
 style.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2096#comment:18>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development