[INN] #4: Add encoding to checkgroups processing

Russ Allbery rra at stanford.edu
Sun Dec 21 22:30:30 UTC 2008


Julien ÉLIE <julien at trigofacile.com> writes:

>> Add encoding specifications such as:
>>
>>  # Output encoding for newsgroups file.
>>  /encoding/:utf-8
>>
>>  # Incoming encodings in checkgroups.
>>  /encoding/:*:cp1252
>>  /encoding/:cn.*:gb18030
>>  /encoding/:fido.*:utf-8
>>  /encoding/:fr.*:iso-8859-15
>>
>> and then update docheckgroups processing to use iconv to convert
>>  newsgroup descriptions to the output encoding before adding them
>> to newsgroups.
>
> Do you think it should be put into control.ctl and control.ctl.local
> with that syntax?
> Or should a third file be used?  (control.ctl.encodings?  which would
> allow not to use "/encoding/")

I think there's some benefit to keeping it all together in the same
place.  I think the encodings are likely to also be something that people
will want to get from a reputable central source instead of figuring out
on their own, so it's easier if they can update control.ctl instead of two
files.

> If a control article contains an explicit charset, should it be used
> instead of what is specified in this file?
> Should another option be added?
>    /encoding/:fr.*:iso-8859-15:force
> Another better syntax?

Is anyone currently tagging their messages with a charset that's wrong and
doesn't match the content of the message?  My inclination would be to
trust the article charset over whatever we have in an encoding section,
since hopefully people will change over time to move towards UTF-8.  But
if we have to override broken encodings, we should probably provide some
mechanism for doing so.

I can't think off-hand of a better syntax than the :force option.

> Shouldn't we keep the "from"?
>
>    /encoding/:control at usenet-fr.news.eu.org:fr.*:iso-8859-15
>
> It might be useful if there are several senders!

Well, control.ctl already doesn't deal with multiple non-cooperating
senders; you have to pick one, and you can probably pick an encoding at
the same time.  For cooperating senders, I think they need to agree on
encoding anyway, or at least include a charset in their messages.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.



More information about the inn-workers mailing list