MIME Distribution: headers

Thu Jan 1 00:39:07 UTC 2009

Julien ÉLIE <julien at trigofacile.com> writes:

> The Distribution: header is not necessarily in US-ASCII characters.
>
> Though USEFOR states:
>
>   dist-name       =  ALPHA / DIGIT
>                      *( ALPHA / DIGIT / "+" / "-" / "_" )
>
> RFC 3977 states:
>
>   distribution = token
>   token = 1*P-CHAR
>   P-CHAR     = A-CHAR / UTF8-non-ascii
>
> As we already did in Makefile.global:
>
>   ##      If you modify these two strings, you must encode them in UTF-8
>   ##      (using only US-ASCII characters is consequently also fine) and
>   ##      keep their length reasonable; otherwise, your news server will not
>   ##      be complying with the NNTP protocol.
>
>   VERSION  = 2.5.0
>   VERSION_EXTRA = prerelease
>
> we probably should do the same with the distrib.pats file and its man page.

Yup, sounds good.

> However, my concern is with newsfeeds.  It handles distributions.  Does
> it mean we have to do some sort of MIME decoding in order to implement
> RFC 3977?

No, I don't think so, at least unless we decide to do so with all
configuration files.  The easy approach, and the reasonable one for now, I
think, is to require that people who put non-ASCII distributions in
newsfeeds use UTF-8.  (This is probably also worth a comment.)  Then the
existing code, which does byte string comparisons, should just work.

> Another question:  with the active.times file, I do not know what is the
> best we can do in order to write the newsgroup creator's name in
> UTF-8...  I think that only ctlinnd matters for that (mod-active and
> controlchan write "usenet" or something like that -- I have not
> checked).  Is putting a warning in the man page of ctlinnd enough?  The
> encoding depends on the one of the shell used!

Yeah, that one is hard.  I'm not sure there's any really good solution
there other than a warning... I guess the other option would be to check
the string we're about to write to be sure it's correctly formed UTF-8,
and if it isn't, fail with an error instead of creating the group.

We probably need a general function to check for correctly formed UTF-8
anyway.

> Dealing with encodings is not easy at all!

Indeed.  It's extremely tricky.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.