bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tr not respecting UTF-8 locale ?


From: Michał Kosmulski
Subject: tr not respecting UTF-8 locale ?
Date: Mon, 11 Oct 2004 15:41:44 +0200
User-agent: Mozilla Thunderbird 0.8 (X11/20040913)

hello,
I am using a UTF-8 locale and all coreutils except for tr seem to agree on that. However, tr behaves differently and seems to always assume that 1 byte == 1 character even in a UTF-8 locale. Consider this:

address@hidden:~$ locale
LANG=en_US.UTF-8
LC_CTYPE=pl_PL.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=pl_PL.UTF-8
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=pl_PL.UTF-8
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT=pl_PL.UTF-8
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
address@hidden:~$ tr äöü aou
xxäyyözzütt
xxuoyyuuzzuutt
address@hidden:~$ tr ä ab
xäy
xaby

IMO, this clearly indicates that tr considers each character to be exactly one byte wide. In the case of "tr äöü aou" all three umlauts have the same first byte, so the last substitution for that byte is effective, that's why ä becomes uo and not ao in that case.
Michal Kosmulski

--
Michal Kosmulski
http://hektor.umcs.lublin.pl/~mikosmul/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]