BIND 10 #2375: implement MasterLexer getNextToken() and ungetToken()

Mon Nov 19 17:04:48 UTC 2012

#2375: implement MasterLexer getNextToken() and ungetToken()
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20121120
  medium                             |            Resolution:
                  Component:         |             Sensitive:  0
  libdns++                           |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  5
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
  loadzone-ng                        |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Hello

 Sometimes, I wonder if you write more code or more trac comments :-P

 Anyway, on a more serious note. I rebased the branch once more on top of
 master
 (to get the string states in, which I thought I need, but they didn't
 provide
 what I wanted anyway). As you already have the branch checked out, I
 pushed it
 as trac2375-r1. The commits up to 6131e5b395558dd51a63deb4565e2b9bc253c704
 (included, „Provide weak exception guarantee“) are the same modulo some
 trivial
 merge resolution and don't need to be reviewed again.

 Replying to [comment:7 jinmei]:
 > '''missing other signature of getNextToken'''
 >
 > It's probably not clear from the ticket's description, but it was
 > actually intended to implement another version of getNextToken
 > (I thought it was mentioned in the main ticket of `MasterLexer`).
 > It's equivalent of BIND 9's isc_lex_getmastertoken().  This can wait,
 > however, until we implement rdata extensions to support the generic
 > lexer.  In that case we need to create a ticket for it.

 To clarify, so it should be a thin wrapper around the current
 getNextToken?

 > First, I suspect we cannot feasibly revert to the previous token.  In
 > the case of string-variant token, the token refers to an internal
 > placeholder vector of `MasterLexer` (see #2373).  It's always cleared
 > once a new session for getNextToken() (of string variants) starts,
 > so the content of `previous_token_` can be already invalid when it's
 > used in ungetToken().

 Looking at it again, the content of token_ should not be observable from
 outside (except from the test getter, but that doesn't count), so I simply
 removed it.

 > Secondly, I don't think we need to catch exceptions (and recover from
 > it) in `getNextToken`.  The only possible exception of our own in this
 > method is `ReadError` (from the input source), and other possible
 > exceptions are from the standard library (and possibly from boost)
 > such as std::bad_alloc if it's not the only one.

 I'm not sure. I could remove it, but I think it is better this way for two
 reasons:
  * The general approach of trying to be safe.
  * In case a big master file includes another one and the other one is not
    readable for some reason, it might be good to continue parsing (and
    reporting errors) at least for the big one. It is probably not that
    important, because it would be rather rare occasion, but I don't think
 it
    would be that hard to implement either.

 So, I'd ask. Is there a disadvantage to handling the exceptions?

 > Thirdly (probably a minor point), the BIND 9 version of ungettoken
 > doesn't reset last_was_eol.  I was not sure whether it was
 > intentional, or a some kind of bug of BIND 9, or whether it simply
 > doesn't matter, but please check if the difference causes any trouble.
 > You may also want to ask this at bind10-dev.

 I don't know if it is a bug on bind9 side, or this case was just never
 needed.
 But from the logical point of view, it should be reset, at least if the
 job
 description for ungetToken is to return to the state before the last
 token. I
 think that „peeking“ at a token should be safe and sideeffectless
 operation. If
 there's something that depends on the fact that ungetToken does not return
 the
 last-eof condition back in bind9, I think it is time to rethink the
 approach,
 because that's hard to understand and read. Such code is expected in perl,
 not
 in C++ ;-).

 So, in short, I think the returning one in the current code is the more
 correct.

 > First off, I don't see the need for such a tricky setup to test
 > `getNextToken` and `ungetToken`.  In particular, `getNextToken` is
 > basically a simple method; we only check if it calls the state class
 > objects from start to end and returns the token that is set
 > in `impl_->token_` at the time of the end of the method.  Its
 > interaction with the input source is hidden within the state classes
 > and I don't see the need for testing it (in fact, such separation is
 > one major reason for extracting the state transition into a separate
 > class).

 Well, if I omit the fact that I can check what string I get from the
 source and
 can manipulate the paren count and throw from inside, I want to check one
 thing
 first, before I spend nontrivial amount of time trying to do the tests
 without it
 or think of a way to push the fake ones inside by some trick.

 I looked at the current master, which already contains the string states.
 There
 doesn't seem to be any state that returns something other than NULL from
 its
 handle. The first question therefore is, is this OK for tests or is it
 incomplete then? The second, if there's no such state, will there be? If
 not,
 could we simplify the design by a lot?

 > So, I'd first like to suggest considering test the lexer class using
 > the existing state classes and already available inspection hooks.  If
 > you still want to fully control the state transition in tests, I'd
 > like to make it happen without introducing a derived class of lexer or
 > putting fake definitions in the production code.  For example, extend
 > the constructor of `MasterLexer` so it can take a pointer to the
 > "start" function
 > {{{#!cpp
 >     typedef const master_lexer_internal::State* (start_fn_t)(Options);
 >     // if non NULL, use it, otherwise we use the builtin state classes.
 >     // clarify that the non NULL usage is only for tests.
 >     MasterLexer(start_fn_t start_fn = NULL);
 > }}}
 > and define necessary fake classes in the test code (that only uses
 > read-only accessors for the lexer).

 I'm not sure it can be reasonably tested by read-only accessors. The goal
 of
 the fake state is to change the internal state and mess around with it.

 > - we should probably unify NOT_STARTED and NO_TOKEN_PRODUCED.  The
 former
 >   is only used as an initial placeholder value.

 I'm not sure. I was thinking about that, but then I decided not to. The
 situations
 are really different. The NOT_STARTED is part of normal lifetime of the
 lexer.
 The NO_TOKEN_PRODUCED should actually never ever happen except for the
 short time
 in which the new token is being computed. Considering the enum values are
 cheap
 (the enum already takes a whole byte, so adding few more changes nothing),
 I don't
 see compelling reason to have them unified. I think that could be more
 confusing
 than having them separated.

 > - getNextToken(): the behavior on atEOF() is different from that of
 >   the BIND 9 version:
 > {{{#!c
 >  if (isc_buffer_remaininglength(source->pushback) == 0 &&
 >      source->at_eof)
 >  {
 >   if ((options & ISC_LEXOPT_DNSMULTILINE) != 0 &&
 >       lex->paren_count != 0) {
 >    lex->paren_count = 0;
 >    return (ISC_R_UNBALANCED);
 >   }
 >   if ((options & ISC_LEXOPT_EOF) != 0) {
 >    tokenp->type = isc_tokentype_eof;
 >    return (ISC_R_SUCCESS);
 >   }
 >   return (ISC_R_EOF);
 >  }
 > }}}
 >   (Note: in our implementation we can assume LEXOPT_DNSMULTILINE and
 >   LEXOPT_EOF).

 I'm probably lost here. I added a test to see the unbalanced parenthesis
 is
 reported (and it is). So, what other difference is there? I'm not really
 confident
 in bind9 code, I didn't write any bit of it O:-).

 > - this might be safe depending on what happens in handle(), but I'd
 >   protect state in scoped_ptr:
 > {{{#!cpp
 >             const State* state(State::getFakeState(NULL, 0, &token_));
 >             state->handle(*this);
 >             delete state;
 > }}}

 I addressed this. But I didn't notice having `const` in my code. That
 could
 suggest you did your usual constify run through the code, but forgot to
 push.
 If it is the case, could you push to the original branch and I'd cherry-
 pick it
 onto the current branch?

-- 
Ticket URL: <http://bind10.isc.org/ticket/2375#comment:12>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development