BIND 10 #404: serialize RDATA for in memory data source
BIND 10 Development
do-not-reply at isc.org
Tue Apr 12 18:53:01 UTC 2011
#404: serialize RDATA for in memory data source
-------------------------------------+-------------------------------------
Reporter: jinmei | Owner: jinmei
Type: task | Status: reviewing
Priority: major | Milestone:
Component: data | Sprint-20110419
source | Resolution:
Keywords: | Sensitive: 0
Estimated Number of Hours: 0.0 | Add Hours to Ticket: 0
Billable?: 1 | Total Hours: 0
Internal?: 0 |
-------------------------------------+-------------------------------------
Changes (by vorner):
* owner: vorner => jinmei
Comment:
Hello
Replying to [comment:17 jinmei]:
> I admit this syntax may look tricky. But at the same time I thought
this was a quite well-known and commoly used idiom for decent level of
C++/STL programmers, and I would say we should allow this usage in our
code when need conversion between the STL world and lower-level pure C
interfaces. I noticed you made explict comments about this point in your
revised branch, to which I wouldn't necessarily make an objection.
Actually, I wasn't worried about the syntax. It was that this kind of
thing wouldn't work if it was, let's say, a list. It actually uses an
implementation detail of vector to work, which is not advertised trough
its API, but only guaranteed somewhere in the documentation. Such things
are generally used when coding in perl, but I consider them hard to read,
as it needs nontrivial joining of information to understand what is
actually happening.
So I'm not really opposing the use, because it has some advantages (like
the size can grow by itself), but I think these parts of code really do
deserve a comment whenever used to show that the author did it in
conscious way, not naively guessing.
> I'm okay with changing uint8 to char (noticing you didn't change that
part). Frankly, though, if we worry about possible incompatibility
between uint8 and unsigned char, I'm afraid we'll find much more serious
cases in our main code (rather than in a benchmark tool).
Well, the worry wasn't about incompatibility of their binary
representation (if there's, lets say, 9bit char, the whole code is, I
guess, unusable), but the original worry was about aliasing, that if
uint8_t and unsigned char were declared as different types. But then I
decided that it makes no sense to declare new types when an typedef alias
is enough, so we shouldn't meet this in practice and saved myself some
work.
> > Is std::vector really not portable? We use it in headers all the time.
Is the hiding in pimpl really worth the prize of the overhead and more
complicated implementation?
>
> It's quite likely to be non portable (or non compatible) at the binary
> level when we change the STL implementation (even the debug/normal
> modes of STLPort are incompatible; for example, we'd have to build and
> use gtest with the same mode of STLPort). Basically, I want to
> minimize the risk of breaking the compatibility unless there's a
> strong need to expose it. I admit we are already breaking this level
> of compatibility in various places of our code, but if we use the fact
> as an excuse to break it further, it will be more and more difficult
> to restore the compatibility when if and we want to do that.
I'd really be worried to go that way, it really complicates matters a lot
‒ we wouldn't be able to put anything except basic types into the headers,
not even a shared pointer. This kind of thing is usually solved by distro
maintainers anyway. So, maybe it's a thing we should bring up to some
call? Or, was it already discussed sometime?
> > Looking at the FieldSpec ‒ is it probable we use all 16 bits of the
length field? Surely not for a name and if it is possible to have such
long binary fields (I doubt there are any in the wild), we could split it
into more consecutive fields. If we spared 2 bits from the length, we
could put the type of field to the same 2-byte word (either by manually
masking & shifting) or by field width specification (but I fear it's
feature of C99). Since now it takes 3 bytes and another byte will be
padding, because the uint16 can't live on odd address, this would make the
size half.
>
> I agree it makes sense to consider compressing the fields further, but
> that would probably be a next step, probably after measuring the
> actual memory footprint with real world example (big) zones.
Yes, we seem to agree on that one then.
Replying to [comment:18 jinmei]:
> Replying to [comment:15 vorner]:
> - Also as I said, I'd still hide the buffer using pimpl at the moment.
Well, there are actually two issues on that. One is it brings quite non-
trivial overhead (which might or might not be important, but it's not only
the instructions of calling a functions, it's that the compiler loses most
of its invariants across a call to unknown function, because it must be in
a different file). The second, probably more important one, is code
readability. Too little of structure/objects make the code hard to follow,
leading to spaghetti code. But the opposite, too many layers of
indirection, make the code equally hard to follow. If you look at the
writeUint8 function, for example, it is already only a wrapper function to
call the same on a buffer, which is, in turn, a single-line function with
assignment. If we add a pimpl there, there's another indirection that
brings no simplification whatsoever, only adding yet another file that
must be opened (bringing it up to 3) to find out there's nothing fancy
happening at all.
So I'm little bit reluctant to do that. Is it about the binary
compatibility as well?
> - In general I don't like to have a non private member variable unless
it's absolutely necessary (just as many C++ books suggest), and in that
sense I don't like to make AbstractMessageRenderer::buffer_ protected.
IMO in this case it's not "absolute necessary", and I'd make it private
(whether or not via pimpl) and have derived classes access it through
public writeXXX interfaces. (I'm noticing that in order to do that we'd
also need some extensions to AbstractMessageRenderer)
Again, what good does it bring? Because anybody is able to fully overwrite
the buffer with the write* methods, extend it, shrink it, etc. So it
doesn't bring any protection of integrity. It only complicates matters for
whoever is writing the writeName() function. The rule in whatever book is
rule of thumb, that you don't want just anybody being able to dig in your
guts. This doesn't apply here, since everybody is already able to do so
(and, by definition of what the class does, must be). Or do I miss some
benefit here?
> - [comment] This is an off topic for this ticket, but I've been thinking
it makes more sense to hide the output buffer inside the
(Abstract)MessageRenderer (again, whether or not pimpl) rather than
passing it via the constructor. Previously the renderer class didn't have
getData() and getDataLength(), so the caller needs to get access to the
row data via the OutputBuffer interfaces. But now the render also
provides the way to get access to the raw data via its own
getData()/getDataLength(), it makes the caller simpler if the renderer
internally constructs the OutputBuffer (or even some other specialized
data structure). Now we are touching the renderer class, this may be a
good opportunity to make this cleanup. But since it will affect many
other parts of the entire BIND 10 code (we'll also need to update the
python binding and its users, for example), it may better be deffered to a
separate ticket. (If this change makes sense) either is okay for me.
It does make sense in the terms of clean design. On the other side, it
might make one optimisation a little bit harder ‒ currently, it is
possible to reuse the same buffer to render many messages. Maybe it would
be possible to reuse the whole renderer, though.
Anyway, could I propose that we take the lazy approach, make a ticket or
comment at the interface and leave it as it is until the change brings any
advantage?
> '''rdatafields.h'''
> - I don't understand the point of changing the return type of
getFieldSpecData(). Which specific alias problem does the change try to
solve?
Thinking about it once again, I was wrong. The problem would be if anybody
tried to access the data by a pointer to something different than `char`,
`FieldSpec` or `Type/uint16_t` (depending on the part of data accessed).
So, in that sense, `void *` is incompatible with the data, meaning that
you can't access the `void` pointed to because of aliasing rules. However,
no one can really access `void`, so it's not an issue.
So I might return it back to `void *`. However, thinking about it, why we
returned `void *` in the first place, when the docs say it can be type-
casted to `FieldSpec`? In that sense, the API seems inconsistent, since
`void *` usually describe „typeless unknown data“. Wouldn't returning a
`FieldSpec *` make more sense?
> - In any case, I'd avoid offending comments like this:
> {{{
> + // Nasty cast, because C++ authors didn't read the C
specification
> + // about pointers. We can't return void* from it, that would
break
> }}}
> It's an unfounded accusation (how could we know whether C++ authors
read the specification?), and even if it's the fact, this type of comment
only makes us look vulgar and doesn't help anything IMO. Let's be
professional and just provide technical explanation.
ACK. Anyway, it seems strange that using an unsafe cast to `void *` is
allowed (this exact cast can be used to break the aliasing rules), but
cast to `char *` that is completely safe regarding these is not.
So, I'm deffering some implementation until we get to some consensus
there.
Sorry for disagreeing so much, but I can't help myself feeling it
differently.
With regards
--
Ticket URL: <http://bind10.isc.org/ticket/404#comment:20>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list