BIND 10 #404: serialize RDATA for in memory data source

Tue Apr 12 18:53:01 UTC 2011

#404: serialize RDATA for in memory data source
-------------------------------------+-------------------------------------
                 Reporter:  jinmei   |                Owner:  jinmei
                     Type:  task     |               Status:  reviewing
                 Priority:  major    |            Milestone:
                Component:  data     |  Sprint-20110419
  source                             |           Resolution:
                 Keywords:           |            Sensitive:  0
Estimated Number of Hours:  0.0      |  Add Hours to Ticket:  0
                Billable?:  1        |          Total Hours:  0
                Internal?:  0        |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Hello

 Replying to [comment:17 jinmei]:
 > I admit this syntax may look tricky.  But at the same time I thought
 this was a quite well-known and commoly used idiom for decent level of
 C++/STL programmers, and I would say we should allow this usage in our
 code when need conversion between the STL world and lower-level pure C
 interfaces.  I noticed you made explict comments about this point in your
 revised branch, to which I wouldn't necessarily make an objection.

 Actually, I wasn't worried about the syntax. It was that this kind of
 thing wouldn't work if it was, let's say, a list. It actually uses an
 implementation detail of vector to work, which is not advertised trough
 its API, but only guaranteed somewhere in the documentation. Such things
 are generally used when coding in perl, but I consider them hard to read,
 as it needs nontrivial joining of information to understand what is
 actually happening.

 So I'm not really opposing the use, because it has some advantages (like
 the size can grow by itself), but I think these parts of code really do
 deserve a comment whenever used to show that the author did it in
 conscious way, not naively guessing.

 > I'm okay with changing uint8 to char (noticing you didn't change that
 part).  Frankly, though, if we worry about possible incompatibility
 between uint8 and unsigned char, I'm afraid we'll find much more serious
 cases in our main code (rather than in a benchmark tool).

 Well, the worry wasn't about incompatibility of their binary
 representation (if there's, lets say, 9bit char, the whole code is, I
 guess, unusable), but the original worry was about aliasing, that if
 uint8_t and unsigned char were declared as different types. But then I
 decided that it makes no sense to declare new types when an typedef alias
 is enough, so we shouldn't meet this in practice and saved myself some
 work.

 > > Is std::vector really not portable? We use it in headers all the time.
 Is the hiding in pimpl really worth the prize of the overhead and more
 complicated implementation?
 >
 > It's quite likely to be non portable (or non compatible) at the binary
 > level when we change the STL implementation (even the debug/normal
 > modes of STLPort are incompatible; for example, we'd have to build and
 > use gtest with the same mode of STLPort).  Basically, I want to
 > minimize the risk of breaking the compatibility unless there's a
 > strong need to expose it.  I admit we are already breaking this level
 > of compatibility in various places of our code, but if we use the fact
 > as an excuse to break it further, it will be more and more difficult
 > to restore the compatibility when if and we want to do that.

 I'd really be worried to go that way, it really complicates matters a lot
 ‒ we wouldn't be able to put anything except basic types into the headers,
 not even a shared pointer. This kind of thing is usually solved by distro
 maintainers anyway. So, maybe it's a thing we should bring up to some
 call? Or, was it already discussed sometime?

 > > Looking at the FieldSpec ‒ is it probable we use all 16 bits of the
 length field? Surely not for a name and if it is possible to have such
 long binary fields (I doubt there are any in the wild), we could split it
 into more consecutive fields. If we spared 2 bits from the length, we
 could put the type of field to the same 2-byte word (either by manually
 masking & shifting) or by field width specification (but I fear it's
 feature of C99). Since now it takes 3 bytes and another byte will be
 padding, because the uint16 can't live on odd address, this would make the
 size half.
 >
 > I agree it makes sense to consider compressing the fields further, but
 > that would probably be a next step, probably after measuring the
 > actual memory footprint with real world example (big) zones.

 Yes, we seem to agree on that one then.

 Replying to [comment:18 jinmei]:
 > Replying to [comment:15 vorner]:
 > - Also as I said, I'd still hide the buffer using pimpl at the moment.

 Well, there are actually two issues on that. One is it brings quite non-
 trivial overhead (which might or might not be important, but it's not only
 the instructions of calling a functions, it's that the compiler loses most
 of its invariants across a call to unknown function, because it must be in
 a different file). The second, probably more important one, is code
 readability. Too little of structure/objects make the code hard to follow,
 leading to spaghetti code. But the opposite, too many layers of
 indirection, make the code equally hard to follow. If you look at the
 writeUint8 function, for example, it is already only a wrapper function to
 call the same on a buffer, which is, in turn, a single-line function with
 assignment. If we add a pimpl there, there's another indirection that
 brings no simplification whatsoever, only adding yet another file that
 must be opened (bringing it up to 3) to find out there's nothing fancy
 happening at all.

 So I'm little bit reluctant to do that. Is it about the binary
 compatibility as well?

 > - In general I don't like to have a non private member variable unless
 it's absolutely necessary (just as many C++ books suggest), and in that
 sense I don't like to make AbstractMessageRenderer::buffer_ protected.
 IMO in this case it's not "absolute necessary", and I'd make it private
 (whether or not via pimpl) and have derived classes access it through
 public writeXXX interfaces. (I'm noticing that in order to do that we'd
 also need some extensions to AbstractMessageRenderer)

 Again, what good does it bring? Because anybody is able to fully overwrite
 the buffer with the write* methods, extend it, shrink it, etc. So it
 doesn't bring any protection of integrity. It only complicates matters for
 whoever is writing the writeName() function. The rule in whatever book is
 rule of thumb, that you don't want just anybody being able to dig in your
 guts. This doesn't apply here, since everybody is already able to do so
 (and, by definition of what the class does, must be). Or do I miss some
 benefit here?

 > - [comment] This is an off topic for this ticket, but I've been thinking
 it makes more sense to hide the output buffer inside the
 (Abstract)MessageRenderer (again, whether or not pimpl) rather than
 passing it via the constructor.  Previously the renderer class didn't have
 getData() and getDataLength(), so the caller needs to get access to the
 row data via the OutputBuffer interfaces.  But now the render also
 provides the way to get access to the raw data via its own
 getData()/getDataLength(), it makes the caller simpler if the renderer
 internally constructs the OutputBuffer (or even some other specialized
 data structure).  Now we are touching the renderer class, this may be a
 good opportunity to make this cleanup.  But since it will affect many
 other parts of the entire BIND 10 code (we'll also need to update the
 python binding and its users, for example), it may better be deffered to a
 separate ticket.  (If this change makes sense) either is okay for me.

 It does make sense in the terms of clean design. On the other side, it
 might make one optimisation a little bit harder ‒ currently, it is
 possible to reuse the same buffer to render many messages. Maybe it would
 be possible to reuse the whole renderer, though.

 Anyway, could I propose that we take the lazy approach, make a ticket or
 comment at the interface and leave it as it is until the change brings any
 advantage?

 > '''rdatafields.h'''
 > - I don't understand the point of changing the return type of
 getFieldSpecData().  Which specific alias problem does the change try to
 solve?

 Thinking about it once again, I was wrong. The problem would be if anybody
 tried to access the data by a pointer to something different than `char`,
 `FieldSpec` or `Type/uint16_t` (depending on the part of data accessed).
 So, in that sense, `void *` is incompatible with the data, meaning that
 you can't access the `void` pointed to because of aliasing rules. However,
 no one can really access `void`, so it's not an issue.

 So I might return it back to `void *`. However, thinking about it, why we
 returned  `void *` in the first place, when the docs say it can be type-
 casted to `FieldSpec`? In that sense, the API seems inconsistent, since
 `void *` usually describe „typeless unknown data“. Wouldn't returning a
 `FieldSpec *` make more sense?

 > - In any case, I'd avoid offending comments like this:
 > {{{
 > +        // Nasty cast, because C++ authors didn't read the C
 specification
 > +        // about pointers. We can't return void* from it, that would
 break
 > }}}
 >   It's an unfounded accusation (how could we know whether C++ authors
 read the specification?), and even if it's the fact, this type of comment
 only makes us look vulgar and doesn't help anything IMO.  Let's be
 professional and just provide technical explanation.

 ACK. Anyway, it seems strange that using an unsafe cast to `void *` is
 allowed (this exact cast can be used to break the aliasing rules), but
 cast to `char *` that is completely safe regarding these is not.

 So, I'm deffering some implementation until we get to some consensus
 there.

 Sorry for disagreeing so much, but I can't help myself feeling it
 differently.

 With regards

-- 
Ticket URL: <http://bind10.isc.org/ticket/404#comment:20>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development