unicode and UTF-8

Russ Allbery rra at stanford.edu
Fri Jul 2 03:23:41 UTC 2004


Lisa Ungar <lungar at us.ibm.com> writes:

> For example, could one have a newsgroup which is Japanese as well as
> English newsgroups on the same inn server?

INN makes no interpretation of the contents of newsgroup names at all.
You can name your newsgroups in any character set that you wish which
doesn't have embedded nul characters; INN won't care.  The only things to
be careful of are that news readers may not handle non-ASCII newsgroup
names correctly and that you have to be particularly cautious with control
messages to not accidentally encode the name of the newsgroup in
quoted-printable or base64, since INN doesn't know enough to decode them
again.

Note that this is not particularly widely tested, so there is some
possibility that you will encounter problems in the more obscure tools,
although I find it unlikely.

> Is it the newsreader client  that provides the ability to enter and 
> display UTF-8 format from the inn server? 

This is where your problems most likely will be; the news reader should be
able to display body text in any character set, but for headers that can't
use RFC 2047 encoding (such as the Newsgroups header), the news reader has
to know to guess and has to guess right.

> Do you recommended that we upgrade from 2.3.X to 2.4.1 for UTF-8 support?

No, INN's support in this regard has been mostly unchanged for ages; the
wildmat changes only affect fairly obscure issues when matching complex
wildmat patterns (particularly ones using ? or character classes) against
newsgroup names encoded in UTF-8.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.


More information about the inn-workers mailing list