BIND 10 #2372: define State class for MasterLexer, base part
BIND 10 Development
do-not-reply at isc.org
Mon Nov 12 20:45:05 UTC 2012
#2372: define State class for MasterLexer, base part
-------------------------------------+-------------------------------------
Reporter: | Owner: jinmei
jinmei | Status: reviewing
Type: task | Milestone:
Priority: | Sprint-20121120
medium | Resolution:
Component: | Sensitive: 0
libdns++ | Sub-Project: DNS
Keywords: | Estimated Difficulty: 6
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: |
loadzone-ng |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:8 vorner]:
> First, it does not compile for me:
>
> {{{
> ../../../../src/lib/dns/master_lexer.cc: In static member function
‘static const isc::dns::master_lexer_internal::State&
isc::dns::master_lexer_internal::State::getInstance(isc::dns::master_lexer_internal::State::ID)’:
> ../../../../src/lib/dns/master_lexer.cc:221:1: error: control reaches
end of non-void function [-Werror=return-type]
> }}}
Ah, okay, I believe I fixed it.l
> Also, I'd like to clarify ‒ does the single transition mean producing
one
> token? So you start after each new token?
No, a transition that ends with NULL produces one token. If a
transition results in a non NULL state pointer, the token isn't
identified yet and you need to call handle() again.
> Looking at this enum, I know it is correct code, but it looks unusual
and
> possibly fragile to me (if it is re-ordered, it will do strange things).
Would
> it be better to do it like:
> {{{#!c++
> enum Options {
> NONE = 0, //< No option
> INITIAL_WS = 1 << 0, ///< recognize begin-of-line spaces
> QSTRING = 1 << 1, ///< recognize quoted string
> NUMBER = 1 << 2 ///< recognize numeric text as integer
> };
> }}}
Hmm, in that case, I'd simply use bare constants, and I've changed the
code that way. I also noticed the enum type didn't have
documentation, so I added it too.
> The comment seems to be missing „in“:
> {{{
> The initial transition takes place a static method of the base class,
> }}}
Ah, okay, fixed.
> I think the concrete state classes can be inside the anonymous namespace
as
> well. They are defined inside the C++ file and don't escape outside, so
there's
> no need to pollute the namespace. It also should allow the compiler to
do more
> optimisations, knowing the code won't be used from other module
directly.
You're right, fixed.
> Some sessions seem to produce no tokens at all (and return NULL). Is the
old
> token reset, so the caller knows there's no token? Or how the caller
knows not
> to look into the token?
As far as I can see the only exception is `String::handle()`, which is
currently incomplete and only used in tests in an incomplete way.
This will be completed in #2373.
It would be better if we can make it sure in the code flow (i.e,
ensuring token is always set if handle() returns NULL), but I don't
have a good idea on how to do it.
> The `start` method contains a loop with huge if-else-if-else sequence.
It could
> be written as a switch, which could be more readable. Is there a reason
to use
> the if-else way? Maybe some extension in future?
When we support number token, we'll have a condition like this:
{{{#!c
} else if (isdigit((unsigned char)c) &&
(options & ISC_LEXOPT_NUMBER) != 0) {
}}}
so we won't simply be able to use switch-case. We could separate this
if condition as a special case and use switch for others, but IMO the
real problem is not the use of the chain of if-else, but the long case
analysis per se. I was aware of the readability issue of this method,
but right now my conclusion is that it's a borderline case and
probably too early to introduce refactoring for better readability.
At the time we complete the string and number cases we can check this
again, and if it exceeds a threshold we can think about refactoring.
In that case I'd probably introduce helper methods to keep the main
code flow concise, even if we don't reduce the number of cases
directly.
> In the tests, you seem to re-fill the string stream after reading from
it. Is
> it OK? Can it be used in the way that you read and write from the same
> stringstream? What happens if you get to the end while reading (so it
returns
> EOF) and then refill it? Is it allowed by the stream? Couldn't the lexer
get
> confused?
Good point, I didn't really think about that, but re-reading iostream
and stringbuf (used in sstream) APIs, I think this is actually safe.
Both input and output streams share the same conceptual array, and
that condition must be kept when an output operation makes a
modification to it. But, on further thought, it's probably still
safer to make it clear that we are done with the stream before we pass
it to `InputSource`, because, even if `InputSource` is basically
private (so we know its implementation and we can control it) it's
still a different class and it should be safer if we have fewer
assumptions on how it uses the given stream. So I changed the test
case so all test data are prepared before the stream is passed to the
input source.
--
Ticket URL: <http://bind10.isc.org/ticket/2372#comment:9>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list