sed and encodings
Julien ÉLIE
julien at trigofacile.com
Mon Jan 19 19:39:09 UTC 2009
An interesting pointer:
http://linuxproblem.org/art_21.html
%%
sed behaving strangely in UTF-8 environment
You are using a Linux distribution with UTF-8 encoding
such as SuSE 9.1. You are using sed to operate on files
containing German Umlauts or other non-Ascii characters.
sed is behaving quite strangly: an expression like
sed 's/.*/x/'
normally should replace an arbitrary string by a single x.
The dot, however, does not match non-Ascii characters any more!
The problem occurs if you operate on ISO-8859 (Latin)
encoded files. A non-ascii character is misinterpreted
in UTF-8 as a sequence of characters or - even worse -
as an invalid UTF-8 string. So sed classifies the character
as something not being matched by a dot. Strange and dangerous...
%%
That's weird.
Does somebody know how to prevent sed from behaving like that?
It becomes problematic!
--
Julien ÉLIE
« Loving unconditional means forgiving and learning to live
with his imperfections. Because in the end
you'll realize that it is what you love the most. »
More information about the inn-workers
mailing list