BIND 10 #2371: define dns::MasterLexer class
BIND 10 Development
do-not-reply at isc.org
Sat Oct 27 06:38:03 UTC 2012
#2371: define dns::MasterLexer class
-------------------------------------+-------------------------------------
Reporter: | Owner: jinmei
jinmei | Status: accepted
Type: task | Milestone:
Priority: | Sprint-20121106
medium | Resolution:
Component: | Sensitive: 0
libdns++ | Sub-Project: DNS
Keywords: | Estimated Difficulty: 5
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: |
loadzone-ng |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Description changed by jinmei:
Old description:
> subtask of #2368. slightly depend on #2369 (but could be done
> in parallel).
>
> This is the main class for our lexer of master zone files. It's a
> port of BIND 9's lib/isc/lex.c:isc_lex, but is much simplified so
> that it only handles DNS master files (isc_lex is more generic, and
> supports, e.g., C-style comments).
>
> The main method is getNextToken(), which is a port of
> isc_lex_gettoken(). But the BIND 9 version is a big, monolithic,
> complicated function containing a very long loop and switch-case with
> goto. I'd like to make it more readable with some object-oriented
> flavor. Specifically, I propose using the state design pattern to
> implement the internal state transition. This part will go to
> separate sub tasks.
>
> At the moment, a rough sketch of the main class is as follows:
> {{{#!cpp
> class MasterLexer {
> private:
> friend class master_lexer_internal::State; // for the state DP
> public:
> enum Options { // see also below about options
> INITIAL_WS, //begin-of-line spaces are okay
> QSTRING, // quoted string okay (otherwise '"' would be part of
> string)
> NUMBER // numeric is recognized as integer (otherwise it's
> considered a string)
> };
>
> // This should go to a separate task
> const MasterToken& getNextToken(Options options);
>
> // Similar to getNextToken(), but only accept specified type of token
> // or EOL/EOF (if eol_ok is true). no option.
> const MasterToken& getNextToken(TokenType expect_type, bool eol_ok);
>
> // source->ungetAll(), reset paren_count_
> void ungetToken();
>
> // These simple ones will be in a single task
> std::string getSourceName() const { return (sources_.top()->name_); }
> size_t getSourceLine() const { return
> (sources_.top()->getCurrentLine()); }
> open(const char* filename); // create new source and push it to
> sources_
> open(std::istream&);
> close(); // close current "source"
>
> private:
> bool last_was_eol_; // true if we just passed a new line
> bool no_comments_; // true if we are now in a comment
> bool escaped_; // true if we just ate '\'
> size_t paren_count_; // nest level of unclosed '('
> size_t saved_paren_count_; // used in ungetToken
> master_lexer_internal::State* saved_state_;
> stack<master_lexer_internal::InputSource> sources_;
> MasterToken token_; // used as a return value of getNextToken
>
> // helper method, used by states, detect if it's the beginning of
> // a comment.
> bool isCommentStart(int c, State* current_state);
> }
> }}}
>
> Regarding options, a complete list of the BIND 9 implementation can be
> found in lib/isc/include/isc/lex.h. Due to the simplification, we can
> ignore some of them, and we can assume some others are always
> specified (and therefore can be ignored). From a quick look (but
> check it yourself), what we need to import are:
>
> {{{#!c
> #define ISC_LEXOPT_INITIALWS 0x04 /*%< Want initial
> whitespace. */
> #define ISC_LEXOPT_NUMBER 0x08 /*%< Recognize numbers.
> */
> #define ISC_LEXOPT_QSTRING 0x10 /*%< Recognize qstrings.
> */
> }}}
>
> What we can assume are:
> {{{#!c
> #define ISC_LEXOPT_EOL 0x01 /*%< Want end-of-line
> token. */
> #define ISC_LEXOPT_EOF 0x02 /*%< Want end-of-file
> token. */
> #define ISC_LEXOPT_DNSMULTILINE 0x20 /*%< Handle '(' and ')'.
> */
> #define ISC_LEXOPT_ESCAPE 0x100 /*%< Recognize escapes.
> */
> }}}
>
> And what we can just ignore are:
> {{{#!c
> #define ISC_LEXOPT_NOMORE 0x40 /*%< Want "no more"
> token. */
> #define ISC_LEXOPT_CNUMBER 0x80 /*%< Recognize octal and
> hex. */
> #define ISC_LEXOPT_QSTRINGMULTILINE 0x200 /*%< Allow multiline ""
> strings */
> #define ISC_LEXOPT_OCTAL 0x400 /*%< Expect a octal
> number. */
> }}}
>
> In this task, we just define the class and method excluding
> getNextToken() and ungetToken(). No need to have unnecessary member
> variables or private methods yet.
New description:
subtask of #2368. slightly depend on #2369 (but could be done
in parallel).
This is the main class for our lexer of master zone files. It's a
port of BIND 9's lib/isc/lex.c:isc_lex, but is much simplified so
that it only handles DNS master files (isc_lex is more generic, and
supports, e.g., C-style comments).
The main method is getNextToken(), which is a port of
isc_lex_gettoken(). But the BIND 9 version is a big, monolithic,
complicated function containing a very long loop and switch-case with
goto. I'd like to make it more readable with some object-oriented
flavor. Specifically, I propose using the state design pattern to
implement the internal state transition. This part will go to
separate sub tasks.
At the moment, a rough sketch of the main class is as follows:
{{{#!cpp
class MasterLexer {
private:
friend class master_lexer_internal::State; // for the state DP
public:
enum Options { // see also below about options
INITIAL_WS, //begin-of-line spaces are okay
QSTRING, // quoted string okay (otherwise '"' would be part of
string)
NUMBER // numeric is recognized as integer (otherwise it's
considered a string)
};
// This should go to a separate task
const MasterToken& getNextToken(Options options);
// Similar to getNextToken(), but only accept specified type of token
// or EOL/EOF (if eol_ok is true). no option.
const MasterToken& getNextToken(TokenType expect_type, bool eol_ok);
// source->ungetAll(), reset paren_count_
// this is a port of BIND 9's isc_lex_ungettoken().
void ungetToken();
// These simple ones will be in a single task
std::string getSourceName() const { return (sources_.top()->name_); }
size_t getSourceLine() const { return
(sources_.top()->getCurrentLine()); }
open(const char* filename); // create new source and push it to
sources_
open(std::istream&);
close(); // close current "source"
private:
bool last_was_eol_; // true if we just passed a new line
bool no_comments_; // true if we are now in a comment
bool escaped_; // true if we just ate '\'
size_t paren_count_; // nest level of unclosed '('
size_t saved_paren_count_; // used in ungetToken
master_lexer_internal::State* saved_state_;
stack<master_lexer_internal::InputSource> sources_;
MasterToken token_; // used as a return value of getNextToken
// helper method, used by states, detect if it's the beginning of
// a comment.
bool isCommentStart(int c, State* current_state);
}
}}}
Regarding options, a complete list of the BIND 9 implementation can be
found in lib/isc/include/isc/lex.h. Due to the simplification, we can
ignore some of them, and we can assume some others are always
specified (and therefore can be ignored). From a quick look (but
check it yourself), what we need to import are:
{{{#!c
#define ISC_LEXOPT_INITIALWS 0x04 /*%< Want initial
whitespace. */
#define ISC_LEXOPT_NUMBER 0x08 /*%< Recognize numbers. */
#define ISC_LEXOPT_QSTRING 0x10 /*%< Recognize qstrings.
*/
}}}
What we can assume are:
{{{#!c
#define ISC_LEXOPT_EOL 0x01 /*%< Want end-of-line
token. */
#define ISC_LEXOPT_EOF 0x02 /*%< Want end-of-file
token. */
#define ISC_LEXOPT_DNSMULTILINE 0x20 /*%< Handle '(' and ')'.
*/
#define ISC_LEXOPT_ESCAPE 0x100 /*%< Recognize escapes. */
}}}
And what we can just ignore are:
{{{#!c
#define ISC_LEXOPT_NOMORE 0x40 /*%< Want "no more" token.
*/
#define ISC_LEXOPT_CNUMBER 0x80 /*%< Recognize octal and
hex. */
#define ISC_LEXOPT_QSTRINGMULTILINE 0x200 /*%< Allow multiline ""
strings */
#define ISC_LEXOPT_OCTAL 0x400 /*%< Expect a octal
number. */
}}}
In this task, we just define the class and method excluding
getNextToken() and ungetToken(). No need to have unnecessary member
variables or private methods yet.
--
--
Ticket URL: <http://bind10.isc.org/ticket/2371#comment:4>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list