BIND 10 #2377: define dns::MasterLoad class

Mon Dec 10 14:47:27 UTC 2012

#2377: define dns::MasterLoad class
-------------------------------------+-------------------------------------
            Reporter:  jinmei        |                        Owner:
                Type:  task          |  jinmei
            Priority:  medium        |                       Status:
           Component:  libdns++      |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20121218
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  7             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |  loadzone-ng
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Hello

 Replying to [comment:19 jinmei]:
 > Fair enough, but in that case I'd suggest explicitly rejecting 0 as an
 > invalid parameter.  Otherwise, a careless application like this would
 > result in an infinite loop:
 >
 > {{{#!cpp
 >    bool done = false
 >    while (!done) {
 >        done = loader.loadIncremental(0);
 >    }
 > }}}

 While it probably is a good idea (and I adopted the proposal), I'd like to
 point out that it is always easy to screw something up, no matter how we
 try to
 prevent it. There's always one more way to do something wrong and nobody
 would
 write such loop in real application with even tiny bit of thinking.

 > Since I don't know the original rationale of the BIND 9 rule and I can
 > only guess it, I cannot be sure, but at least I see some cases where the
 > implicit style could cause real harm which could be avoided with the
 > explicit style: http://www.cve.mitre.org/cgi-
 bin/cvename.cgi?name=CVE-2012-2122
 > And, in that sense, the shared pointer usage is totally different;
 > our code itself doesn't play the risky business (depending on how the
 > type-conversion parameter is implemented, of course).  So we could say
 > the policy is consistent: we prefer safer (less error prone) way; if
 > it's safe enough we prefer intuitive syntax.
 >
 > [...]
 >
 > Maybe the bikeshed of the week for the next team call?

 Maybe. I'll try to add it to the list tomorrow, and if there's no time,
 I'll
 try to get the rationale behind the bare pointers (there probably should
 be no
 auto-conversion trick that could get us, like with ints of various sizes).

 > > Here, in the ticket, or somewhere in the code?
 >
 > In this ticket, in the sample code snippet of the task description:

 No, I mean where should I leave the note.

 > > I believe most or all of these check should actually go to some
 `validate()`
 > > function we want to do later.
 >
 > Right, but...
 >
 > > I'm not sure the loader itself is the best place
 > > to be crowded by such checks (expecting it to get even more
 complicated with
 > > all the $<something> handling and stuff).
 >
 > for this particular case it will be much difficult to check later,
 > because we need to go through the entire zone to see if there's any
 > SOA RR.  I guess that's why BIND 9 does this check in the loader code.

 Well, that is not entirely true. For one, the check could be done in the
 addRR callback. And, won't validate iterate the whole zone because of some
 other checks anyway?

 Actually, I'd very like the MasterLoader to be able to load anything that
 is at
 least slightly syntactically correct, even if it makes no sense as a zone.
 What
 if someone wanted to store a DNS cache content to a disk for some
 debugging
 purposes? Or something like that? Doing it in the master file format could
 be
 pretty natural. Then, it would be nice to use MasterLoader for whatever
 tool would
 inspect the content. And that would be better without having bonus
 warnings and more
 complicated code.

 > - stream version of `MasterLoader` constructor doesn't seem to be
 >   exception safe.  We need something like:
 > {{{#!cpp
 >     auto_ptr<MasterLoaderImpl> impl(new MasterLoaderImpl...);
 >     impl->pushStreamSource(stream);
 >     impl_ = impl.release();
 > }}}

 :-| That is what I don't like about C++.

 > - About this point:
 > >> Especially when `MANY_ERRORS` isn't specified, I think we should tell
 > >> the caller it fails when it fails, [...]
 > >> the `MasterLoader` class.  I prefer throwing an exception
 >   I'd also like to somehow indicate the error to the caller when it
 >   happens in the "lenient (MANY_ERRORS)" mode.  Maybe record the
 >   first or last error message in the class so the caller can inspect
 >   it?

 I think storing first or last error is philosophically wrong. If you want
 to
 see the content of the error and you want MANY_ERRORS, you want all the
 errors,
 not just some.

 But I added a method to check if the loading was successful or not.

 > - I was not sure about the status/plan on this point: "It doesn't seem
 >   to implement this part of BIND 9's implementation:"
 > {{{#!c
 > 99if (token.type == isc_tokentype_eof) {
 > 999if (read_till_eol)
 > 9999WARNUNEXPECTEDEOF(lctx->lex);
 > ...
 > 99}
 > }}}
 >   I saw some comment in the test that may be related to this topic,
 >   but cannot find a change on the main code or mention in your
 >   response.

 It was one of the comments I didn't have time for on Friday, maybe I
 didn't say
 it explicitly enough. It is handled now.

 And, about this one:

 > - This block of code seems to behave differently from BIND 9.  Is that
 >   intentional?
 > {{{#!cpp
 >                 bool empty = true;
 >                 do {
 >                     const MasterToken&
 empty_token(lexer_.getNextToken());
 >                     if (empty_token.getType() ==
 MasterToken::END_OF_FILE) {
 >                         // TODO: Check if this is the last source,
 possibly pop
 >                         return (true);
 >                     }
 >                     empty = empty_token.getType() ==
 MasterToken::END_OF_LINE;
 >                 } while (empty);
 >                 // Return the last token, as it was not empty
 >                 lexer_.ungetToken();
 > }}}
 >   I guess a port of the corresponding part of the BIND 9 version would
 >   look like this:
 > {{{#!cpp
 >     // We should also specify MasterLexer::INITIAL_WS (see my previous
 comment
 >     // for the ticket)
 >     const MasterToken& token =
 lexer_.getNextToken(MasterLexer::QSTRING);
 >     switch (token.getType()) {
 >     case MasterToken::END_OF_FILE:
 >         if (read_till_eol) {
 >             callbacks_.warn(...);
 >         }
 >         // TODO, possibly pop source
 >         return (true);
 >     case MasterToken::END_OF_FILE:
 >         read_till_eol = true;
 >         continue;
 >     case MasterToken::ERROR:
 >         if ((options_ & MANY_ERRORS) == 0) {
 >             return (true);
 >         }
 >         callbacks_.error(...);
 >         read_till_eol = true;
 >         continue;
 >     case MasterToken::STRING:
 >     case MasterToken::QSTRING:
 >         break;
 >     default:                    // buggy case
 >         assert(false); // or throw UnExpected or something
 >     }
 > }}}
 >   As we discussed in a different ticket, I don't necessarily insist we
 >   should do the same, but I also believe it's generally wise to follow
 >   the existing practice unless we have a specific reason not to do
 >   so.  In this specific case, at least I see one advantage in the
 >   latter version: we don't have to do ungetToken() and re-get it.
 >   re-getting a string token is a bit more expensive because we go
 >   through the string characters and copy them twice.

 I spent some time reading your ported code and trying to understand the
 original bind9 code.

 First, the `read_till_error` is set only in the case it tries to recover
 from
 EOF, which is handled separately (in the catch branch) in the current
 code. I
 believe that is better, since it separates the exceptional case and
 doesn't
 clutter the main code.

 And I believe the rest is equivalent (except for the performance, see
 below).
 In the current code, it first finds the first token that is not newline
 (handling EOF, of course). Then it tries to use it and in case it is the
 wrong
 one, it throws (which is better, makes the code more readable). The bind9
 code
 does both phases at once ‒ if it is newline, it continues, if it is
 something
 else, it has cases for all the possibilities.

 I also believe the current code is easier to read, as it pushes the error
 handling outside of the main flow.

 The only disadvantage is the fact that we unget a token and get it again,
 so
 the errors are handled by the correct getNextToken() method. But I believe
 that
 concern is not only premature optimisation, but the overhead is actually
 negligible. It is not copying the string, it is only scanning the
 characters once
 again in the buffer in source. And as the strings are expected to be small
 in
 real life, they'll fit into the L1 cache without any problems. This should
 be
 comparably fast as calling a virtual method ‒ which we do a lot. And, this
 happens
 once per RR ‒ we then need to handle it, putting it into some data source.
 And
 we read the data from disk at speed around 100MB/s at most.

 So, putting it together, I'd like to keep the code this way, as I believe
 the
 better readability overweights the disadvantages.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2377#comment:21>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development