[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: tr doesn't support multibyte characters
From: |
Paul Eggert |
Subject: |
Re: tr doesn't support multibyte characters |
Date: |
Wed, 14 Sep 2005 15:21:54 -0700 |
User-agent: |
Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux) |
Egmont Koblinger <address@hidden> writes:
> I guess tr should support multibyte character sets, even if not by default,
> then by providing a command line option.
That'd be nice. It's a bit tricky, though. Doing it right would
require that tr support encoding errors (stray byte sequences that
cannot be parsed as parts of multibyte characters). For example, one
should easily be able to remove the encoding errors without making any
other changes, or to transliterate to upper-case while preserving
encoding errors. Help in this area would be appreciated.
The POSIX spec for tr
<http://www.opengroup.org/onlinepubs/009695399/utilities/tr.html>
talks about this issue somewhat, but it's incoherent -- I can't make
heads or tails of what the -C option is really supposed to do.
> If I'm wrong and the current behavior is the desired one then please replace
> all occurances of "character" to "byte" in its manual.
The CVS version of the coreutils manual talks about this, saying
"Currently @command{tr} fully supports only single-byte characters.
Eventually it will support multibyte characters; ..." with some more
details about the problem.