BIND 10 #2375: implement MasterLexer getNextToken() and ungetToken()
BIND 10 Development
do-not-reply at isc.org
Fri Nov 16 14:27:41 UTC 2012
#2375: implement MasterLexer getNextToken() and ungetToken()
-------------------------------------+-------------------------------------
Reporter: | Owner: jelte
jinmei | Status: reviewing
Type: task | Milestone:
Priority: | Sprint-20121120
medium | Resolution:
Component: | Sensitive: 0
libdns++ | Sub-Project: DNS
Keywords: | Estimated Difficulty: 5
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: |
loadzone-ng |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
Hmm, I just noticed Jelte picked up the review task, but I've
completed my own review (I'm not sure why I didn't own it for the
review...), so I'm dumping it:
I have a few major issues and some relatively minor comments on the
actual implementation.
'''missing other signature of getNextToken'''
It's probably not clear from the ticket's description, but it was
actually intended to implement another version of getNextToken
(I thought it was mentioned in the main ticket of `MasterLexer`).
It's equivalent of BIND 9's isc_lex_getmastertoken(). This can wait,
however, until we implement rdata extensions to support the generic
lexer. In that case we need to create a ticket for it.
'''about error/exception handling'''
First, I suspect we cannot feasibly revert to the previous token. In
the case of string-variant token, the token refers to an internal
placeholder vector of `MasterLexer` (see #2373). It's always cleared
once a new session for getNextToken() (of string variants) starts,
so the content of `previous_token_` can be already invalid when it's
used in ungetToken().
We might be able to even make a deep copy in `previous_token_`, but I
personally don't see the need for it. At least BIND 9's
ungettoken() doesn't provide this feature, so least won't need
it at least in our work for `MasterLoader`. Personally, I would
simply say "the returned token will be only valid until the next call
to `getNextToken()`; if the caller needs to refer to older token
values, it must make its own copy of the token value".
Secondly, I don't think we need to catch exceptions (and recover from
it) in `getNextToken`. The only possible exception of our own in this
method is `ReadError` (from the input source), and other possible
exceptions are from the standard library (and possibly from boost)
such as std::bad_alloc if it's not the only one.
I suspect these are sufficiently critical and it wouldn't make sense
to try to continue the parsing after seeing these. In fact, the BIND
9's equivalent of `ReadError` is treated as a fatal error, and it
terminates the entire parsing procedure even in the "lenient" mode.
As far as I can see, it still provides the basic ("weak") exception
guarantee in that it won't leak any resource or cause a catastrophic
disruption such as segfault in subsequent use (including call to
destructor) of the class.
Thirdly (probably a minor point), the BIND 9 version of ungettoken
doesn't reset last_was_eol. I was not sure whether it was
intentional, or a some kind of bug of BIND 9, or whether it simply
doesn't matter, but please check if the difference causes any trouble.
You may also want to ask this at bind10-dev.
'''on the use of fake state class'''
First off, I don't see the need for such a tricky setup to test
`getNextToken` and `ungetToken`. In particular, `getNextToken` is
basically a simple method; we only check if it calls the state class
objects from start to end and returns the token that is set
in `impl_->token_` at the time of the end of the method. Its
interaction with the input source is hidden within the state classes
and I don't see the need for testing it (in fact, such separation is
one major reason for extracting the state transition into a separate
class).
In the case of `ungetToken`, it may be a bit trickier due to the call
to `impl_->source_->ungetAll()`, but we should be able to be
reasonably confident by checking if we get the same token again
by doing `ungetToken` and then `getNextToken`. And, actually, the
current tests using the fake states don't seem to benefit from the
specialty of the fake on this point anyway.
We provide a way for tests to peek into some of the internal states of
`MasterLexer`, so even without a fake state class we should be able to
test the higher level behavior of `MasterLexer` sufficiently.
I'm also concerned about the way we introduce the fake state classes:
adding it in the production implementation file (master_lexer.cc) and
public header file (master_lexer.h). Admittedly there are already
some interfaces that are only supposed to be used by tests, but they
are basically simple read-only getters and the existence is hidden
in master_lexer_state.h (which is not supposed to be installed or
visible to library users). Defining a protected method for a public
class could send the wrong (at least not originally intended) message
that this class can be conceptually inheritable, and by defining it
in a production .cc we need to build non trivial amount of code that
is not used in production.
So, I'd first like to suggest considering test the lexer class using
the existing state classes and already available inspection hooks. If
you still want to fully control the state transition in tests, I'd
like to make it happen without introducing a derived class of lexer or
putting fake definitions in the production code. For example, extend
the constructor of `MasterLexer` so it can take a pointer to the
"start" function
{{{#!cpp
typedef const master_lexer_internal::State* (start_fn_t)(Options);
// if non NULL, use it, otherwise we use the builtin state classes.
// clarify that the non NULL usage is only for tests.
MasterLexer(start_fn_t start_fn = NULL);
}}}
and define necessary fake classes in the test code (that only uses
read-only accessors for the lexer).
Below are specific comments on the actual implementation. Some may
become irrelevant depending on how we deal with the design issues, but
I'm dumping them anyway.
'''master_lexer.h/cc'''
- we should probably unify NOT_STARTED and NO_TOKEN_PRODUCED. The former
is only used as an initial placeholder value.
- getNextToken(): the behavior on atEOF() is different from that of
the BIND 9 version:
{{{#!c
if (isc_buffer_remaininglength(source->pushback) == 0 &&
source->at_eof)
{
if ((options & ISC_LEXOPT_DNSMULTILINE) != 0 &&
lex->paren_count != 0) {
lex->paren_count = 0;
return (ISC_R_UNBALANCED);
}
if ((options & ISC_LEXOPT_EOF) != 0) {
tokenp->type = isc_tokentype_eof;
return (ISC_R_SUCCESS);
}
return (ISC_R_EOF);
}
}}}
(Note: in our implementation we can assume LEXOPT_DNSMULTILINE and
LEXOPT_EOF).
- what if (or what should happen if) we call ungetToken() after
reaching EOF? Is this case tested?
'''master_lexer_state.h'''
- this doesn't parse:
{{{#!cpp
/// parentheses count is changed accordingly a the last EOL condition
/// set if provided.
}}}
maybe "and the last EOL condition is set if provided"?
'''master_lexer_unittest.cc'''
- this might be safe depending on what happens in handle(), but I'd
protect state in scoped_ptr:
{{{#!cpp
const State* state(State::getFakeState(NULL, 0, &token_));
state->handle(*this);
delete state;
}}}
- ungetSimple: it's not clear to me how this verifies "even the source
got back to original":
{{{#!cpp
// By calling getToken again, we verify even the source got back to
// original. We must push it as a fake start again so it is picked.
}}}
--
Ticket URL: <http://bind10.isc.org/ticket/2375#comment:7>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list