BIND 10 #404: serialize RDATA for in memory data source

Tue Apr 12 00:39:10 UTC 2011

#404: serialize RDATA for in memory data source
-------------------------------------+-------------------------------------
                 Reporter:  jinmei   |                Owner:  jinmei
                     Type:  task     |               Status:  accepted
                 Priority:  major    |            Milestone:
                Component:  data     |  Sprint-20110419
  source                             |           Resolution:
                 Keywords:           |            Sensitive:  0
Estimated Number of Hours:  0.0      |  Add Hours to Ticket:  0
                Billable?:  1        |          Total Hours:  0
                Internal?:  0        |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 (I'll first provide some responses to the original review)

 Replying to [comment:8 vorner]:

 > In the benchmark, using vector<uint8> as a buffer and using casted
 pointer to &buffer[0] seems kludge. First of, the fact that it should work
 is not so apparent from the vector interface, one needs to know the
 guarantee that the elements must be stored in continuous array of memory.

 I admit this syntax may look tricky.  But at the same time I thought this
 was a quite well-known and commoly used idiom for decent level of C++/STL
 programmers, and I would say we should allow this usage in our code when
 need conversion between the STL world and lower-level pure C interfaces.
 I noticed you made explict comments about this point in your revised
 branch, to which I wouldn't necessarily make an objection.

 >  Second, sizeof returns the size in chars and the standard guarantees
 aliasing compatibility with unsigned char pointers only, not with uint8 *.
 So, the compiler is free to deduce these two pointers can never point to
 the same memory and the cast breaks that. So, if you want to use vector
 and not bare pointer to array, could you please use unsigned chars?

 I'm okay with changing uint8 to char (noticing you didn't change that
 part).  Frankly, though, if we worry about possible incompatibility
 between uint8 and unsigned char, I'm afraid we'll find much more serious
 cases in our main code (rather than in a benchmark tool).

 > In the AbstractMessageRenderer, all the writeUint8 and alike methods are
 really small inside and I can't really imagine any way they would do
 anything else than just write the data into some buffer (not counting
 writeName). The fact you made them virtual, I guess you made the call to
 them much more costly than the code inside them and made them impossible
 to inline. Isn't there a way to have only one implementation for them? The
 dividing into fields could be done in other functions, like writeName.

 Personally I'd think it's premature to worry about the overhead of the
 virtual functions at this moment (I wouldn't deny the possibility of
 finding it to be a critical performance bottleneck when we reach the
 stage of optimization).  But I also see most of the writeXXX() methods
 would be a duplicate of the base class version, so, as a conclusion
 I'm okay with defining them non virtual at least until we see the need
 for making them virtual.  I have some other comments regarding this
 point for the actual patch, which I'll explain separately.

 > Is std::vector really not portable? We use it in headers all the time.
 Is the hiding in pimpl really worth the prize of the overhead and more
 complicated implementation?

 It's quite likely to be non portable (or non compatible) at the binary
 level when we change the STL implementation (even the debug/normal
 modes of STLPort are incompatible; for example, we'd have to build and
 use gtest with the same mode of STLPort).  Basically, I want to
 minimize the risk of breaking the compatibility unless there's a
 strong need to expose it.  I admit we are already breaking this level
 of compatibility in various places of our code, but if we use the fact
 as an excuse to break it further, it will be more and more difficult
 to restore the compatibility when if and we want to do that.

 Performance reasons can be a "strong need" in future, so, once we are
 in the optimization phase and find the overhead of indirection is a
 critical bottleneck, I'm okay with exposing the vector definition.
 Until then, I'd rather hide it.

 > RdataFieldComposer has many small methods. Is it worth writing them into
 the class header and then putting them outside? Isn't it nice to put their
 bodies inside the class and hint the compiler these methods are worth
 inlining (though the compiler does mostly what it wants today anyway).

 I suspect this comment doesn't apply to your own patch, which
 eliminated many of the RdataFieldComposer::writeXXX methods, so I'll
 skip this point.

 > The RdataFields::toWire has a weard statement: condition ? true : false.
 Wouldn't condition (without the ?) be enough?

 Ah, this seems to me a common bad practice of mine.  I'm not sure why
 I tend to make this type of expression unnecessarily complicated, but,
 you're right; we don't (shouldn't) need ?:.

 > Looking at the FieldSpec ‒ is it probable we use all 16 bits of the
 length field? Surely not for a name and if it is possible to have such
 long binary fields (I doubt there are any in the wild), we could split it
 into more consecutive fields. If we spared 2 bits from the length, we
 could put the type of field to the same 2-byte word (either by manually
 masking & shifting) or by field width specification (but I fear it's
 feature of C99). Since now it takes 3 bytes and another byte will be
 padding, because the uint16 can't live on odd address, this would make the
 size half.

 I agree it makes sense to consider compressing the fields further, but
 that would probably be a next step, probably after measuring the
 actual memory footprint with real world example (big) zones.

-- 
Ticket URL: <http://bind10.isc.org/ticket/404#comment:17>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development