[NNTP] Internationalisation

Sun Apr 10 17:19:05 UTC 2005

On Apr 8, 2005, at 12:29 PM, Clive D.W. Feather wrote:

> First draft. Comments welcome.

I like it.

>    countries, newsgroup hierarchies, and individuals
>    have all found different solutions which
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           have found a variety of solutions that

>                                             work for them but are not

                                               ^^^^
                                               are satisfactory (?)
                                               are adequate (?)

>    With the increased use of MIME in
>    email, it is becoming more common to find MIME headers identifying
>    the character set of the body, but this is far from universal.

I think the mention of email makes it unclear that "body" indicates an 
NNTP article.  Perhaps "...it is becoming more common for NNTP articles 
to include MIME headers..."?

>    One point that has been generally accepted is that articles can
>    contain octets with the top bit set, and NNTP is only expected to
>    operate on 8-bit clean transport paths.

Potentially you need to mention NUL and bare CR/LF here?

>    and not gratuitously break existing implementations and
>    arrangements, even if they are less than optimal.

This feels a little wordy.  Something like "and not needlessly break 
existing functional but suboptimal implementations and arrangements?"

>    The NNTP itself is extended from US-ASCII [ANSI1986] to UTF-8
>    [RFC3629] in this specification.  Except in the specific areas
>    discussed below, UTF-8 (which is a superset of ASCII) is mandatory
>    and implementations MUST NOT use any other encoding.
>
>    The major deviation from this requirement
                ^^^^^^^^^
                exception?

>    some header values (and, of course, the article body) are generated
>    by users using software which adopts local practices; for example, 
> it
>    may encode all text is in ISO 8859-1 without including a MIME header
>    to that effect.

I had trouble determining the referent of "it" in this sentence, 
perhaps substituting "a client" would clear it up?

(Incidentally, here I see another "which" in a non-restrictive clause.  
Perhaps you don't believe in that grammar rule and I should stop 
pointing it out?)

>    More specifically, while implementations
>    SHOULD only allow the creation of new articles where the headers
>    conform to UTF-8, where an article is obtained from an external
>    source an implementation MAY pass it on, and derive data from it
>    (such as the response to the HDR command), even though the article 
> or
>    the data is not valid UTF-8.

This should be broken into two sentences for clarity.  Suggest:

    More specifically, implementations SHOULD only allow the creation
    of new articles where the headers conform to UTF-8.  However, when
    an article is obtained from an external source, an implementation
    MAY pass it on, and derive data from it (such as the response to
    the HDR command), even though the article or the derived data may
    not be valid UTF-8.

>    Implementations MUST transfer such articles and data correctly.

What does "correctly" mean here?

>    The second area of deviation is

I guess if you like "exception" for "deviation" above, it should be 
changed here too.

>    Restricting newsgroup names to UTF-8 is not a complete solution to
>    the issues, of course.  In particular, when new newsgroup names are
>    created or a user is asked to enter a newsgroup name, some form of
>    canonicalisation will need to take place.

Probably a little more text about canonicalization would be useful here.

--  
Jeffrey M. Vinocur
jeff at litech.org