nnrpd's Argify() and passwords with spaces

Sun Dec 16 20:06:22 UTC 2001

[ moving from news.software.nntp ]

For those inn-workers who don't read news.software.nntp, there
was a bit of heated discussion a few weeks ago about supporting
passwords with spaces.  

It seems that by the (informal) spec we are required to read e.g.
"AUTHINFO PASS foo bar" as meaning the password is "foo" (or
possibly rejecting it for too many parameters).

However, we have seen people have problems with spaces in
passwords (it's been in TODO for a long time) and it seems
reasonable to "extend" the spec to support this.  It
theoretically could break things for people/software that are
depending on the oddities of the current behavior, but certainly
in the overwhelming majority of the cases it should not cause
problems.

Anyway, the news.software.nntp discussion seems to be restricting
itself to policy/correctness issues.  We sketched out an idea for
how to support this and I had a couple implementation questions
that are perhaps best discussed here instead.

The problem stems from the fact that once we've called Argify on
the input, the whitespace is lost.  The only clean solution we
could come up with is to only Argify the first n arguments (where
n is whatever is needed) and thus the remainder would be
available for individual command-processing functions to use as
they see fit.

In article <ylu1v0cfar.fsf at windlord.stanford.edu>, Russ writes:
>Jeffrey M Vinocur <jeff at litech.org> writes:
>
>> [ discussing how Argify affects the original input buffer ]
>> We're nulling out a copy.
>
>On the other hand, not constantly copying the incoming command line may
>eventually be a good idea.

Easily fixed, but for one problem:  if there was leading
whitespace, argv[0] won't point to the beginning of the malloc'd
chunk, and attempting to free it would be bad.  It's fixable, of
course, but requires some thought.  So unless someone has
evidence that command parsing is a bottleneck...

>> We could still have argify scan the whole string and return
>> the count of the total number of arguments, which allows error
>> checking for too few/many.
>
>*nod*

It turns out to be silly to have Argify scan the string past the
first n arguments just to count, because later we may want to
Argify some more of the string which would involve walking over
it again.  So my thought is to replace the existing:

/*
**  Parse a string into a NULL-terminated array of words; return number
**  of words.  If argvp isn't NULL, it and what it points to will be
**  DISPOSE'd.
*/
int Argify(char *line, char ***argvp);

and add the following two functions (after which Argify can be
defined as a wrapper around nArgify with n=-1 for backwards
compatibility):

/*
**  Parse a string into a NULL-terminated array of at most n words;
**  return number of words.  If there are more than n words, stop
**  processing at the beginning of the (n+1)th word, store everything
**  from the beginning of word n+1 in argv[n] and return n+1.  Thus
**  argv[n] == NULL if there are exactly n words.  If n is negative,
**  parses all words.  If argvp isn't NULL, it and what it points to
**  will be DISPOSE'd.
*/
int nArgify(char *line, char ***argvp, int n);

/*
**  Destructively parse a string into a NULL-terminated array of at most
**  n words; return number of words.  Behavior on negative n and strings
**  of more than n words matches that of nArgify (see above).  Caller
**  must supply an array of sufficient size (such as created by
**  nArgify).
**
**  Note that the sequence
**      ac  =  nArgify(line,     &argv,     n1);
**      ac--;
**      ac += reArgify(argv[ac], &argv[ac], n2);
**  is equivalent to
**      ac  =  nArgify(line,     &argv,     n1 + n2);
*/
int reArgify(char *p, char **argv, int n);

The implementation of nArgify is simply to allocate an array of
the appropriate size (just like Argify used to do) and then call
reArgify on that array.

I coded this up already, and it works out pretty cleanly.

So, two implementation questions:

- It's pretty clear that leading whitespace should be removed,
  but what about trailing whitespace?  That is, should a command
  like "AUTHINFO PASS    foo  bar " be read as a password of 
  "foo  bar" or "foo  bar "?

  (In the absence of any strong opinions, the latter is easier to
  implement.)

- There are two possible behaviors for nArgify/reArgify when
  there are more than n arguments (and we leave the remainder of
  the input as argv[n]):

  * Return n+1.  This has the benefit of making checking for too
    many arguments very easy, however there is that odd ac-- that
    has to be done before calls to reArgify (and if there are
    fewer than n arguments in the first place, later calls to
    reArgify will walk over the last argument again for no
    reason, because of the ac--).

  * Return n.  This conceptually makes more sense, I think, and
    removes the ac--, but it means that code like (ac > max) has
    to be rewritten as (ac > max || (ac == max && av[ac] != NULL)).

  Any thoughts on which is less ugly?  If you were writing code
  that used these functions, which would you rather see?  Would
  the latter be improved by having a function (or macro) to
  encapsulate the check?

-- 
Jeffrey M. Vinocur
jeff at litech.org