unified conf syntax proposal..

Thu Jun 21 10:45:18 UTC 2001

Fabien Tassin <fta at sofaraway.org> writes:

> That was also what I thought when I've created incoming.conf years ago.
> I've never used groups in my own servers because it makes things header
> when you have a lot of peers. The main reasons are that a) you have to
> remember all nested changes introduced by default+groups - easy if all
> fit on the same page but it is rare for a transit box and b) if you
> change a parameter (global or in a group) you have have to rewrite
> everything that depends on it. Ick.

I think that just repeating things resolves a lot of the confusion in
places where things are getting confusing.  I've been thinking through
what configuration files might look like and another thing that I think
might help is being able to include files to define a group; that lets you
take a bunch of related peers, define a bunch of defaults at the top of
the file, and then just have simple peer definition blocks and only
remember the stuff at the top of the file.

One thing that's bothering me a bit about a potential unified
configuration file for all of incoming.conf, innfeed.conf, and newsfeeds
is that I'm worried we'll incur some of the complexity that we ran into
with readers.conf, where the relationship between separate blocks is hard
for people to wrap their minds around.  (I'm still trying to think of some
way of making readers.conf easier to understand.)  We definitely want to
be able to define general feed "types" that the peer definition blocks can
just refer to and that deal with such things as the outgoing feed program,
size sorting strategies, maybe spam filtering, and so forth... or at least
I think we do, but it may be that with a different way of thinking about
it, it can all be done inside the peer group with inherited defaults and
will still work fine.

Hm.

Answer hazy, try again later.  :)

>> That has all the same information as your example, but the syntax
>> elements are much simpler.  No special syntax for lists (although I
>> could be argued into [] instead of a de facto standard of
>> comma-separated lists -- there's a good justification for adding that
>> syntax element), a uniform syntax that always has "key: value", and the
>> same syntax as incoming.conf, readers.conf, or innfeed.conf so that we
>> can use the same parser for all of them.

> My goal was not only to reduce the number of parsers but also the
> number of files. For these 3, I want 1 parser and 1 file.

Agreed.

>> I do actually care a lot about keeping to a "key: value" syntax if at
>> all possible, if it doesn't make much difference to you.  That sort of
>> detail doesn't change what the syntax can express, but it makes a
>> *huge* difference when writing the parser and using the same parser for
>> everything.

> you mean what ? the trailling ":" or the 'scalar' value ?
> My view :
> - ":" adds nothing.
> - scalar impose to use strings and then move the syntax checking to
> all the places that will use the parameters. With [] lists, the parser
> can already check that all items are what is expected.
> Oh, I forgot to tell that in my grammar " [ x ] " is equivalent to " x ".

I'm planning on not doing that but doing it the other way around; if you
ask for a vector (a [] list) and there was just a single string value, you
get a vector containing that single string and nothing complains.  But I'm
not sure I see a reason to reduce single-element lists to strings.

The reason for the colon after the parameter key is that it means that the
parser can be implemented without lookahead; this was actually a very
conscious and intentional choice (and I just realized that it's not
mentioned in config-design; I need to go add that, because it's
important).  Without the colon, the parser can't distinguish between:

    parameter value

and

    group tag { ... }

until it reaches the curly brace, which means that you need up to two
tokens of lookahead.  The grammar I'm implementing doesn't require any
lookahead at all; at any point in the parse, given the current state and
the next token, it's possible to unambiguously determine what syntactic
element that token is or to know that it's an error.

In fact, you can *almost* do that determination based on the first
character of any token, except for parameters and their trailing colons.
But keeping the trailing colon syntax, which looks more natural, was worth
that small additional complexity.

I definitely didn't want a yacc parser since in my experience they're
bloated, harder to maintain, don't produce as good of error messages, and
require bison in order to be thread-safe and re-entrant.  By contrast, it
was quite easy to make the parser I wrote be re-entrant.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>