BIND 10 #2372: define State class for MasterLexer, base part

Mon Nov 12 20:45:05 UTC 2012

#2372: define State class for MasterLexer, base part
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20121120
  medium                             |            Resolution:
                  Component:         |             Sensitive:  0
  libdns++                           |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  6
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
  loadzone-ng                        |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:8 vorner]:

 > First, it does not compile for me:
 >
 > {{{
 > ../../../../src/lib/dns/master_lexer.cc: In static member function
 ‘static const isc::dns::master_lexer_internal::State&
 isc::dns::master_lexer_internal::State::getInstance(isc::dns::master_lexer_internal::State::ID)’:
 > ../../../../src/lib/dns/master_lexer.cc:221:1: error: control reaches
 end of non-void function [-Werror=return-type]
 > }}}

 Ah, okay, I believe I fixed it.l

 > Also, I'd like to clarify ‒ does the single transition mean producing
 one
 > token? So you start after each new token?

 No, a transition that ends with NULL produces one token.  If a
 transition results in a non NULL state pointer, the token isn't
 identified yet and you need to call handle() again.

 > Looking at this enum, I know it is correct code, but it looks unusual
 and
 > possibly fragile to me (if it is re-ordered, it will do strange things).
 Would
 > it be better to do it like:
 > {{{#!c++
 >     enum Options {
 >         NONE = 0, //< No option
 >         INITIAL_WS = 1 << 0, ///< recognize begin-of-line spaces
 >         QSTRING = 1 << 1, ///< recognize quoted string
 >         NUMBER = 1 << 2 ///< recognize numeric text as integer
 >     };
 > }}}

 Hmm, in that case, I'd simply use bare constants, and I've changed the
 code that way.  I also noticed the enum type didn't have
 documentation, so I added it too.

 > The comment seems to be missing „in“:
 > {{{
 > The initial transition takes place a static method of the base class,
 > }}}

 Ah, okay, fixed.

 > I think the concrete state classes can be inside the anonymous namespace
 as
 > well. They are defined inside the C++ file and don't escape outside, so
 there's
 > no need to pollute the namespace. It also should allow the compiler to
 do more
 > optimisations, knowing the code won't be used from other module
 directly.

 You're right, fixed.

 > Some sessions seem to produce no tokens at all (and return NULL). Is the
 old
 > token reset, so the caller knows there's no token? Or how the caller
 knows not
 > to look into the token?

 As far as I can see the only exception is `String::handle()`, which is
 currently incomplete and only used in tests in an incomplete way.
 This will be completed in #2373.

 It would be better if we can make it sure in the code flow (i.e,
 ensuring token is always set if handle() returns NULL), but I don't
 have a good idea on how to do it.

 > The `start` method contains a loop with huge if-else-if-else sequence.
 It could
 > be written as a switch, which could be more readable. Is there a reason
 to use
 > the if-else way? Maybe some extension in future?

 When we support number token, we'll have a condition like this:
 {{{#!c
                         } else if (isdigit((unsigned char)c) &&
                                    (options & ISC_LEXOPT_NUMBER) != 0) {
 }}}
 so we won't simply be able to use switch-case.  We could separate this
 if condition as a special case and use switch for others, but IMO the
 real problem is not the use of the chain of if-else, but the long case
 analysis per se.  I was aware of the readability issue of this method,
 but right now my conclusion is that it's a borderline case and
 probably too early to introduce refactoring for better readability.
 At the time we complete the string and number cases we can check this
 again, and if it exceeds a threshold we can think about refactoring.
 In that case I'd probably introduce helper methods to keep the main
 code flow concise, even if we don't reduce the number of cases
 directly.

 > In the tests, you seem to re-fill the string stream after reading from
 it. Is
 > it OK? Can it be used in the way that you read and write from the same
 > stringstream? What happens if you get to the end while reading (so it
 returns
 > EOF) and then refill it? Is it allowed by the stream? Couldn't the lexer
 get
 > confused?

 Good point, I didn't really think about that, but re-reading iostream
 and stringbuf (used in sstream) APIs, I think this is actually safe.
 Both input and output streams share the same conceptual array, and
 that condition must be kept when an output operation makes a
 modification to it.  But, on further thought, it's probably still
 safer to make it clear that we are done with the stream before we pass
 it to `InputSource`, because, even if `InputSource` is basically
 private (so we know its implementation and we can control it) it's
 still a different class and it should be safer if we have fewer
 assumptions on how it uses the given stream.  So I changed the test
 case so all test data are prepared before the stream is passed to the
 input source.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2372#comment:9>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development