wildmat routines and text
rra at stanford.edu
Sun Jul 23 00:41:15 UTC 2000
Please note: This message is crossposted between two mailing lists which
are closed to non-subscribers. Please direct any followups appropriately;
if they're about INN, please limit them to inn-workers, and please send
all NNTP standards discussion to ietf-nntp.
I've finished a new wildmat implementation for INN that adds support for
comma and ! (and optionally @) as discussed in other messages here. I've
not checked it into INN yet because I want to also add UTF-8 support
(which actually doesn't look to be that hard) and make sure it's fully
tested and get some more eyes looking at it, given how core of a routine
it is. I probably will put it on the current development branch shortly.
Below is the documentation that I wrote for INN on how wildmat patterns
work, with all the references to @ removed. This may be suitable for the
standard, although it could probably use some pruning before put into an
RFC since it's intended to be wordy and clear right now.
At <http://www.eyrie.org/~eagle/nntp/> you'll find wildmat.c, the new
implementation, and wildmat-t.c, the test suite that I wrote while writing
it (which may resolve any additional ambiguities). The test suite is in
the public domain; wildmat.c, being based on Rich $alz's implementation,
is covered by the license found in LICENSE in that directory (basically
BSD with the advertising clause). Also in that directory are wildmat.pod
and wildmat.3, the man page for those routines in two formats, which
includes the text below with some additions for @.
Any comments, corrections, and feedback on any of this is very much
A wildmat expression follows rules similar to those of shell filename
wildcards but with some additions and changes. A wildmat expression
is composed of one or more wildmat patterns separated by commas. Each
character in the wildmat pattern matches a literal occurance of that
same character in the text, with the exception of the following
? Matches any single character.
* Matches any sequence of zero or more characters.
[...] A character set, which matches any single character that falls
within that set. The presence of a character between the
brackets adds that character to the set; for example, "[amv]"
specifies the set containing the characters "a", "m", and "v".
A range of characters may be specified using "-"; for example,
"[0-5abc]" is equivalent to "[012345abc]". The order of
characters is as defined in the UTF-8 character set, and if the
start character of such a range falls after the ending character
of the range in that ranking the results of attempting a match
with that pattern are undefined.
In order to include a literal "]" character in the set, it must
be the first character of the set (possibly following "^"); for
example, "a]" matches either "]" or "a". To include a literal
"-" character in the set, it must be either the first or the
last character of the set. Backslashes have no special meaning
inside a character set, nor do any other of the wildmat
[^...] A negated character set. Follows the same rules as a character
set above, but matches any character not contained in the set.
So, for example, "[^]-]" matches any character except "]" and
\ Turns off any special meaning of the following character; the
following character will match itself in the text. "\" will
escape any character, including another backslash or a comma
that otherwise would separate a pattern from the next pattern in
an expression. Note that "\" is not special inside a character
range (no metacharacters are).
In addition, "!" (and possibly "@") have special meaning as the first
character of a pattern; see below.
When matching a wildmat expression against some text, each
comma-separated pattern is matched in order from left to right. In
order to match, the pattern must match the whole text; in regular
expression terminology, it's implicitly anchored at both the beginning
and the end. For example, the pattern "a" matches only the text "a"; it
doesn't match "ab" or "ba" or even "aa". If none of the patterns match,
the whole expression doesn't match. Otherwise, whether the expression
matches is determined entirely by the rightmost matching pattern; the
expression matches the text if and only if the rightmost matching
pattern is not negated.
For example, consider the text "news.misc". The expression "*" matches
this text, of course, as does "comp.*,news.*" (because the second
pattern matches). "news.*,!news.misc" does not match this text because
both patterns match, meaning that the rightmost takes precedence, and
the rightmost matching pattern is negated. "news.*,!news.misc,*.misc"
does match this text, since the rightmost matching pattern is not
Note that the expression "!news.misc" can't match anything. Either the
pattern doesn't match, in which case no patterns match and the
expression doesn't match, or the pattern does match, in which case
because it's negated the expression doesn't match. "*,!news.misc", on
the other hand, is a useful pattern that matches anything except
"!" has significance only as the first character of a pattern; anywhere
else in the pattern, it matches a literal "!" in the text like any other
If the wildmat_poison interface is used, then "@" behaves the same as
"!" except that if an expression fails to match because the rightmost
matching pattern began with "@", WILDMAT_POISON is returned instead of
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
More information about the inn-workers