BIND 10 #2371: define dns::MasterLexer class

BIND 10 Development do-not-reply at isc.org
Sat Oct 27 06:38:03 UTC 2012


#2371: define dns::MasterLexer class
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  accepted
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20121106
  medium                             |            Resolution:
                  Component:         |             Sensitive:  0
  libdns++                           |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  5
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
  loadzone-ng                        |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Description changed by jinmei:

Old description:

> subtask of #2368.  slightly depend on #2369 (but could be done
> in parallel).
>
> This is the main class for our lexer of master zone files.  It's a
> port of BIND 9's lib/isc/lex.c:isc_lex, but is much simplified so
> that it only handles DNS master files (isc_lex is more generic, and
> supports, e.g., C-style comments).
>
> The main method is getNextToken(), which is a port of
> isc_lex_gettoken().  But the BIND 9 version is a big, monolithic,
> complicated function containing a very long loop and switch-case with
> goto.  I'd like to make it more readable with some object-oriented
> flavor.  Specifically, I propose using the state design pattern to
> implement the internal state transition.  This part will go to
> separate sub tasks.
>
> At the moment, a rough sketch of the main class is as follows:
> {{{#!cpp
> class MasterLexer {
> private:
>     friend class master_lexer_internal::State; // for the state DP
> public:
>     enum Options { // see also below about options
>         INITIAL_WS, //begin-of-line spaces are okay
>         QSTRING, // quoted string okay (otherwise '"' would be part of
> string)
>         NUMBER // numeric is recognized as integer (otherwise it's
> considered a string)
>     };
>
>     // This should go to a separate task
>     const MasterToken& getNextToken(Options options);
>
>     // Similar to getNextToken(), but only accept specified type of token
>     // or EOL/EOF (if eol_ok is true).  no option.
>     const MasterToken& getNextToken(TokenType expect_type, bool eol_ok);
>
>     // source->ungetAll(), reset paren_count_
>     void ungetToken();
>
>     // These simple ones will be in a single task
>     std::string getSourceName() const { return (sources_.top()->name_); }
>     size_t getSourceLine() const { return
> (sources_.top()->getCurrentLine()); }
>     open(const char* filename); // create new source and push it to
> sources_
>     open(std::istream&);
>     close();                    // close current "source"
>
> private:
>     bool last_was_eol_;         // true if we just passed a new line
>     bool no_comments_;          // true if we are now in a comment
>     bool escaped_;              // true if we just ate '\'
>     size_t paren_count_;        // nest level of unclosed '('
>     size_t saved_paren_count_;  // used in ungetToken
>     master_lexer_internal::State* saved_state_;
>     stack<master_lexer_internal::InputSource> sources_;
>     MasterToken token_; // used as a return value of getNextToken
>
>     // helper method, used by states, detect if it's the beginning of
>     // a comment.
>     bool isCommentStart(int c, State* current_state);
> }
> }}}
>
> Regarding options, a complete list of the BIND 9 implementation can be
> found in lib/isc/include/isc/lex.h.  Due to the simplification, we can
> ignore some of them, and we can assume some others are always
> specified (and therefore can be ignored).  From a quick look (but
> check it yourself), what we need to import are:
>
> {{{#!c
> #define ISC_LEXOPT_INITIALWS            0x04    /*%< Want initial
> whitespace. */
> #define ISC_LEXOPT_NUMBER               0x08    /*%< Recognize numbers.
> */
> #define ISC_LEXOPT_QSTRING              0x10    /*%< Recognize qstrings.
> */
> }}}
>
> What we can assume are:
> {{{#!c
> #define ISC_LEXOPT_EOL                  0x01    /*%< Want end-of-line
> token. */
> #define ISC_LEXOPT_EOF                  0x02    /*%< Want end-of-file
> token. */
> #define ISC_LEXOPT_DNSMULTILINE         0x20    /*%< Handle '(' and ')'.
> */
> #define ISC_LEXOPT_ESCAPE               0x100   /*%< Recognize escapes.
> */
> }}}
>
> And what we can just ignore are:
> {{{#!c
> #define ISC_LEXOPT_NOMORE               0x40    /*%< Want "no more"
> token. */
> #define ISC_LEXOPT_CNUMBER              0x80    /*%< Recognize octal and
> hex. */
> #define ISC_LEXOPT_QSTRINGMULTILINE     0x200   /*%< Allow multiline ""
> strings */
> #define ISC_LEXOPT_OCTAL                0x400   /*%< Expect a octal
> number. */
> }}}
>
> In this task, we just define the class and method excluding
> getNextToken() and ungetToken().  No need to have unnecessary member
> variables or private methods yet.

New description:

 subtask of #2368.  slightly depend on #2369 (but could be done
 in parallel).

 This is the main class for our lexer of master zone files.  It's a
 port of BIND 9's lib/isc/lex.c:isc_lex, but is much simplified so
 that it only handles DNS master files (isc_lex is more generic, and
 supports, e.g., C-style comments).

 The main method is getNextToken(), which is a port of
 isc_lex_gettoken().  But the BIND 9 version is a big, monolithic,
 complicated function containing a very long loop and switch-case with
 goto.  I'd like to make it more readable with some object-oriented
 flavor.  Specifically, I propose using the state design pattern to
 implement the internal state transition.  This part will go to
 separate sub tasks.

 At the moment, a rough sketch of the main class is as follows:
 {{{#!cpp
 class MasterLexer {
 private:
     friend class master_lexer_internal::State; // for the state DP
 public:
     enum Options { // see also below about options
         INITIAL_WS, //begin-of-line spaces are okay
         QSTRING, // quoted string okay (otherwise '"' would be part of
 string)
         NUMBER // numeric is recognized as integer (otherwise it's
 considered a string)
     };

     // This should go to a separate task
     const MasterToken& getNextToken(Options options);

     // Similar to getNextToken(), but only accept specified type of token
     // or EOL/EOF (if eol_ok is true).  no option.
     const MasterToken& getNextToken(TokenType expect_type, bool eol_ok);

     // source->ungetAll(), reset paren_count_
     // this is a port of BIND 9's isc_lex_ungettoken().
     void ungetToken();

     // These simple ones will be in a single task
     std::string getSourceName() const { return (sources_.top()->name_); }
     size_t getSourceLine() const { return
 (sources_.top()->getCurrentLine()); }
     open(const char* filename); // create new source and push it to
 sources_
     open(std::istream&);
     close();                    // close current "source"

 private:
     bool last_was_eol_;         // true if we just passed a new line
     bool no_comments_;          // true if we are now in a comment
     bool escaped_;              // true if we just ate '\'
     size_t paren_count_;        // nest level of unclosed '('
     size_t saved_paren_count_;  // used in ungetToken
     master_lexer_internal::State* saved_state_;
     stack<master_lexer_internal::InputSource> sources_;
     MasterToken token_; // used as a return value of getNextToken

     // helper method, used by states, detect if it's the beginning of
     // a comment.
     bool isCommentStart(int c, State* current_state);
 }
 }}}

 Regarding options, a complete list of the BIND 9 implementation can be
 found in lib/isc/include/isc/lex.h.  Due to the simplification, we can
 ignore some of them, and we can assume some others are always
 specified (and therefore can be ignored).  From a quick look (but
 check it yourself), what we need to import are:

 {{{#!c
 #define ISC_LEXOPT_INITIALWS            0x04    /*%< Want initial
 whitespace. */
 #define ISC_LEXOPT_NUMBER               0x08    /*%< Recognize numbers. */
 #define ISC_LEXOPT_QSTRING              0x10    /*%< Recognize qstrings.
 */
 }}}

 What we can assume are:
 {{{#!c
 #define ISC_LEXOPT_EOL                  0x01    /*%< Want end-of-line
 token. */
 #define ISC_LEXOPT_EOF                  0x02    /*%< Want end-of-file
 token. */
 #define ISC_LEXOPT_DNSMULTILINE         0x20    /*%< Handle '(' and ')'.
 */
 #define ISC_LEXOPT_ESCAPE               0x100   /*%< Recognize escapes. */
 }}}

 And what we can just ignore are:
 {{{#!c
 #define ISC_LEXOPT_NOMORE               0x40    /*%< Want "no more" token.
 */
 #define ISC_LEXOPT_CNUMBER              0x80    /*%< Recognize octal and
 hex. */
 #define ISC_LEXOPT_QSTRINGMULTILINE     0x200   /*%< Allow multiline ""
 strings */
 #define ISC_LEXOPT_OCTAL                0x400   /*%< Expect a octal
 number. */
 }}}

 In this task, we just define the class and method excluding
 getNextToken() and ungetToken().  No need to have unnecessary member
 variables or private methods yet.

--

-- 
Ticket URL: <http://bind10.isc.org/ticket/2371#comment:4>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list