BIND 10 #2377: define dns::MasterLoad class
BIND 10 Development
do-not-reply at isc.org
Mon Dec 10 14:47:27 UTC 2012
#2377: define dns::MasterLoad class
-------------------------------------+-------------------------------------
Reporter: jinmei | Owner:
Type: task | jinmei
Priority: medium | Status:
Component: libdns++ | reviewing
Keywords: | Milestone:
Sensitive: 0 | Sprint-20121218
Sub-Project: DNS | Resolution:
Estimated Difficulty: 7 | CVSS Scoring:
Total Hours: 0 | Defect Severity: N/A
| Feature Depending on Ticket:
| loadzone-ng
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Changes (by vorner):
* owner: vorner => jinmei
Comment:
Hello
Replying to [comment:19 jinmei]:
> Fair enough, but in that case I'd suggest explicitly rejecting 0 as an
> invalid parameter. Otherwise, a careless application like this would
> result in an infinite loop:
>
> {{{#!cpp
> bool done = false
> while (!done) {
> done = loader.loadIncremental(0);
> }
> }}}
While it probably is a good idea (and I adopted the proposal), I'd like to
point out that it is always easy to screw something up, no matter how we
try to
prevent it. There's always one more way to do something wrong and nobody
would
write such loop in real application with even tiny bit of thinking.
> Since I don't know the original rationale of the BIND 9 rule and I can
> only guess it, I cannot be sure, but at least I see some cases where the
> implicit style could cause real harm which could be avoided with the
> explicit style: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=CVE-2012-2122
> And, in that sense, the shared pointer usage is totally different;
> our code itself doesn't play the risky business (depending on how the
> type-conversion parameter is implemented, of course). So we could say
> the policy is consistent: we prefer safer (less error prone) way; if
> it's safe enough we prefer intuitive syntax.
>
> [...]
>
> Maybe the bikeshed of the week for the next team call?
Maybe. I'll try to add it to the list tomorrow, and if there's no time,
I'll
try to get the rationale behind the bare pointers (there probably should
be no
auto-conversion trick that could get us, like with ints of various sizes).
> > Here, in the ticket, or somewhere in the code?
>
> In this ticket, in the sample code snippet of the task description:
No, I mean where should I leave the note.
> > I believe most or all of these check should actually go to some
`validate()`
> > function we want to do later.
>
> Right, but...
>
> > I'm not sure the loader itself is the best place
> > to be crowded by such checks (expecting it to get even more
complicated with
> > all the $<something> handling and stuff).
>
> for this particular case it will be much difficult to check later,
> because we need to go through the entire zone to see if there's any
> SOA RR. I guess that's why BIND 9 does this check in the loader code.
Well, that is not entirely true. For one, the check could be done in the
addRR callback. And, won't validate iterate the whole zone because of some
other checks anyway?
Actually, I'd very like the MasterLoader to be able to load anything that
is at
least slightly syntactically correct, even if it makes no sense as a zone.
What
if someone wanted to store a DNS cache content to a disk for some
debugging
purposes? Or something like that? Doing it in the master file format could
be
pretty natural. Then, it would be nice to use MasterLoader for whatever
tool would
inspect the content. And that would be better without having bonus
warnings and more
complicated code.
> - stream version of `MasterLoader` constructor doesn't seem to be
> exception safe. We need something like:
> {{{#!cpp
> auto_ptr<MasterLoaderImpl> impl(new MasterLoaderImpl...);
> impl->pushStreamSource(stream);
> impl_ = impl.release();
> }}}
:-| That is what I don't like about C++.
> - About this point:
> >> Especially when `MANY_ERRORS` isn't specified, I think we should tell
> >> the caller it fails when it fails, [...]
> >> the `MasterLoader` class. I prefer throwing an exception
> I'd also like to somehow indicate the error to the caller when it
> happens in the "lenient (MANY_ERRORS)" mode. Maybe record the
> first or last error message in the class so the caller can inspect
> it?
I think storing first or last error is philosophically wrong. If you want
to
see the content of the error and you want MANY_ERRORS, you want all the
errors,
not just some.
But I added a method to check if the loading was successful or not.
> - I was not sure about the status/plan on this point: "It doesn't seem
> to implement this part of BIND 9's implementation:"
> {{{#!c
> 99if (token.type == isc_tokentype_eof) {
> 999if (read_till_eol)
> 9999WARNUNEXPECTEDEOF(lctx->lex);
> ...
> 99}
> }}}
> I saw some comment in the test that may be related to this topic,
> but cannot find a change on the main code or mention in your
> response.
It was one of the comments I didn't have time for on Friday, maybe I
didn't say
it explicitly enough. It is handled now.
And, about this one:
> - This block of code seems to behave differently from BIND 9. Is that
> intentional?
> {{{#!cpp
> bool empty = true;
> do {
> const MasterToken&
empty_token(lexer_.getNextToken());
> if (empty_token.getType() ==
MasterToken::END_OF_FILE) {
> // TODO: Check if this is the last source,
possibly pop
> return (true);
> }
> empty = empty_token.getType() ==
MasterToken::END_OF_LINE;
> } while (empty);
> // Return the last token, as it was not empty
> lexer_.ungetToken();
> }}}
> I guess a port of the corresponding part of the BIND 9 version would
> look like this:
> {{{#!cpp
> // We should also specify MasterLexer::INITIAL_WS (see my previous
comment
> // for the ticket)
> const MasterToken& token =
lexer_.getNextToken(MasterLexer::QSTRING);
> switch (token.getType()) {
> case MasterToken::END_OF_FILE:
> if (read_till_eol) {
> callbacks_.warn(...);
> }
> // TODO, possibly pop source
> return (true);
> case MasterToken::END_OF_FILE:
> read_till_eol = true;
> continue;
> case MasterToken::ERROR:
> if ((options_ & MANY_ERRORS) == 0) {
> return (true);
> }
> callbacks_.error(...);
> read_till_eol = true;
> continue;
> case MasterToken::STRING:
> case MasterToken::QSTRING:
> break;
> default: // buggy case
> assert(false); // or throw UnExpected or something
> }
> }}}
> As we discussed in a different ticket, I don't necessarily insist we
> should do the same, but I also believe it's generally wise to follow
> the existing practice unless we have a specific reason not to do
> so. In this specific case, at least I see one advantage in the
> latter version: we don't have to do ungetToken() and re-get it.
> re-getting a string token is a bit more expensive because we go
> through the string characters and copy them twice.
I spent some time reading your ported code and trying to understand the
original bind9 code.
First, the `read_till_error` is set only in the case it tries to recover
from
EOF, which is handled separately (in the catch branch) in the current
code. I
believe that is better, since it separates the exceptional case and
doesn't
clutter the main code.
And I believe the rest is equivalent (except for the performance, see
below).
In the current code, it first finds the first token that is not newline
(handling EOF, of course). Then it tries to use it and in case it is the
wrong
one, it throws (which is better, makes the code more readable). The bind9
code
does both phases at once ‒ if it is newline, it continues, if it is
something
else, it has cases for all the possibilities.
I also believe the current code is easier to read, as it pushes the error
handling outside of the main flow.
The only disadvantage is the fact that we unget a token and get it again,
so
the errors are handled by the correct getNextToken() method. But I believe
that
concern is not only premature optimisation, but the overhead is actually
negligible. It is not copying the string, it is only scanning the
characters once
again in the buffer in source. And as the strings are expected to be small
in
real life, they'll fit into the L1 cache without any problems. This should
be
comparably fast as calling a virtual method ‒ which we do a lot. And, this
happens
once per RR ‒ we then need to handle it, putting it into some data source.
And
we read the data from disk at speed around 100MB/s at most.
So, putting it together, I'd like to keep the code this way, as I believe
the
better readability overweights the disadvantages.
--
Ticket URL: <http://bind10.isc.org/ticket/2377#comment:21>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list