BIND 10 #2375: implement MasterLexer getNextToken() and ungetToken()
BIND 10 Development
do-not-reply at isc.org
Mon Nov 19 17:04:48 UTC 2012
#2375: implement MasterLexer getNextToken() and ungetToken()
-------------------------------------+-------------------------------------
Reporter: | Owner: jinmei
jinmei | Status: reviewing
Type: task | Milestone:
Priority: | Sprint-20121120
medium | Resolution:
Component: | Sensitive: 0
libdns++ | Sub-Project: DNS
Keywords: | Estimated Difficulty: 5
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: |
loadzone-ng |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Changes (by vorner):
* owner: vorner => jinmei
Comment:
Hello
Sometimes, I wonder if you write more code or more trac comments :-P
Anyway, on a more serious note. I rebased the branch once more on top of
master
(to get the string states in, which I thought I need, but they didn't
provide
what I wanted anyway). As you already have the branch checked out, I
pushed it
as trac2375-r1. The commits up to 6131e5b395558dd51a63deb4565e2b9bc253c704
(included, „Provide weak exception guarantee“) are the same modulo some
trivial
merge resolution and don't need to be reviewed again.
Replying to [comment:7 jinmei]:
> '''missing other signature of getNextToken'''
>
> It's probably not clear from the ticket's description, but it was
> actually intended to implement another version of getNextToken
> (I thought it was mentioned in the main ticket of `MasterLexer`).
> It's equivalent of BIND 9's isc_lex_getmastertoken(). This can wait,
> however, until we implement rdata extensions to support the generic
> lexer. In that case we need to create a ticket for it.
To clarify, so it should be a thin wrapper around the current
getNextToken?
> First, I suspect we cannot feasibly revert to the previous token. In
> the case of string-variant token, the token refers to an internal
> placeholder vector of `MasterLexer` (see #2373). It's always cleared
> once a new session for getNextToken() (of string variants) starts,
> so the content of `previous_token_` can be already invalid when it's
> used in ungetToken().
Looking at it again, the content of token_ should not be observable from
outside (except from the test getter, but that doesn't count), so I simply
removed it.
> Secondly, I don't think we need to catch exceptions (and recover from
> it) in `getNextToken`. The only possible exception of our own in this
> method is `ReadError` (from the input source), and other possible
> exceptions are from the standard library (and possibly from boost)
> such as std::bad_alloc if it's not the only one.
I'm not sure. I could remove it, but I think it is better this way for two
reasons:
* The general approach of trying to be safe.
* In case a big master file includes another one and the other one is not
readable for some reason, it might be good to continue parsing (and
reporting errors) at least for the big one. It is probably not that
important, because it would be rather rare occasion, but I don't think
it
would be that hard to implement either.
So, I'd ask. Is there a disadvantage to handling the exceptions?
> Thirdly (probably a minor point), the BIND 9 version of ungettoken
> doesn't reset last_was_eol. I was not sure whether it was
> intentional, or a some kind of bug of BIND 9, or whether it simply
> doesn't matter, but please check if the difference causes any trouble.
> You may also want to ask this at bind10-dev.
I don't know if it is a bug on bind9 side, or this case was just never
needed.
But from the logical point of view, it should be reset, at least if the
job
description for ungetToken is to return to the state before the last
token. I
think that „peeking“ at a token should be safe and sideeffectless
operation. If
there's something that depends on the fact that ungetToken does not return
the
last-eof condition back in bind9, I think it is time to rethink the
approach,
because that's hard to understand and read. Such code is expected in perl,
not
in C++ ;-).
So, in short, I think the returning one in the current code is the more
correct.
> First off, I don't see the need for such a tricky setup to test
> `getNextToken` and `ungetToken`. In particular, `getNextToken` is
> basically a simple method; we only check if it calls the state class
> objects from start to end and returns the token that is set
> in `impl_->token_` at the time of the end of the method. Its
> interaction with the input source is hidden within the state classes
> and I don't see the need for testing it (in fact, such separation is
> one major reason for extracting the state transition into a separate
> class).
Well, if I omit the fact that I can check what string I get from the
source and
can manipulate the paren count and throw from inside, I want to check one
thing
first, before I spend nontrivial amount of time trying to do the tests
without it
or think of a way to push the fake ones inside by some trick.
I looked at the current master, which already contains the string states.
There
doesn't seem to be any state that returns something other than NULL from
its
handle. The first question therefore is, is this OK for tests or is it
incomplete then? The second, if there's no such state, will there be? If
not,
could we simplify the design by a lot?
> So, I'd first like to suggest considering test the lexer class using
> the existing state classes and already available inspection hooks. If
> you still want to fully control the state transition in tests, I'd
> like to make it happen without introducing a derived class of lexer or
> putting fake definitions in the production code. For example, extend
> the constructor of `MasterLexer` so it can take a pointer to the
> "start" function
> {{{#!cpp
> typedef const master_lexer_internal::State* (start_fn_t)(Options);
> // if non NULL, use it, otherwise we use the builtin state classes.
> // clarify that the non NULL usage is only for tests.
> MasterLexer(start_fn_t start_fn = NULL);
> }}}
> and define necessary fake classes in the test code (that only uses
> read-only accessors for the lexer).
I'm not sure it can be reasonably tested by read-only accessors. The goal
of
the fake state is to change the internal state and mess around with it.
> - we should probably unify NOT_STARTED and NO_TOKEN_PRODUCED. The
former
> is only used as an initial placeholder value.
I'm not sure. I was thinking about that, but then I decided not to. The
situations
are really different. The NOT_STARTED is part of normal lifetime of the
lexer.
The NO_TOKEN_PRODUCED should actually never ever happen except for the
short time
in which the new token is being computed. Considering the enum values are
cheap
(the enum already takes a whole byte, so adding few more changes nothing),
I don't
see compelling reason to have them unified. I think that could be more
confusing
than having them separated.
> - getNextToken(): the behavior on atEOF() is different from that of
> the BIND 9 version:
> {{{#!c
> if (isc_buffer_remaininglength(source->pushback) == 0 &&
> source->at_eof)
> {
> if ((options & ISC_LEXOPT_DNSMULTILINE) != 0 &&
> lex->paren_count != 0) {
> lex->paren_count = 0;
> return (ISC_R_UNBALANCED);
> }
> if ((options & ISC_LEXOPT_EOF) != 0) {
> tokenp->type = isc_tokentype_eof;
> return (ISC_R_SUCCESS);
> }
> return (ISC_R_EOF);
> }
> }}}
> (Note: in our implementation we can assume LEXOPT_DNSMULTILINE and
> LEXOPT_EOF).
I'm probably lost here. I added a test to see the unbalanced parenthesis
is
reported (and it is). So, what other difference is there? I'm not really
confident
in bind9 code, I didn't write any bit of it O:-).
> - this might be safe depending on what happens in handle(), but I'd
> protect state in scoped_ptr:
> {{{#!cpp
> const State* state(State::getFakeState(NULL, 0, &token_));
> state->handle(*this);
> delete state;
> }}}
I addressed this. But I didn't notice having `const` in my code. That
could
suggest you did your usual constify run through the code, but forgot to
push.
If it is the case, could you push to the original branch and I'd cherry-
pick it
onto the current branch?
--
Ticket URL: <http://bind10.isc.org/ticket/2375#comment:12>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list