bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#8871: Bug with "sort -i" ?


From: Eric Blake
Subject: bug#8871: Bug with "sort -i" ?
Date: Wed, 15 Jun 2011 15:41:06 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110419 Red Hat/3.1.10-1.el6_0 Mnenhy/0.8.3 Thunderbird/3.1.10

[re-adding the list]

On 06/15/2011 03:28 PM, Al Bogner wrote:
>> When all of the bytes are ignored as non-printable, then all three
>> lines are identical, hence -u prints only one line.
> 
> Ok and thanks. I had a different understanding of non-printable.

Non-printable translates to whether isprint(3) returns 0 for a given
byte (single-byte locale, like C), or iswprint(3) returns 0 for a given
wide character (Unicode character composed from UTF-8 bytes, multi-byte
locale like de_DE.UTF-8).  These functions are locale-specific (a byte
value may be deemed printable in one locale but not another).
Furthermore, isprint(0xa0) and iswprint(0xa0) may give different results
within the same locale, if the implementation is trying to reject
incomplete UTF-8 sequences and only understands complete wchar_t as
characters, in which case any code that uses isprint() on the individual
bytes of UTF-8 rather than iswprint() on the wchar_t of each composed
Unicode character will get the (unfortunate) results that no multi-byte
characters are recognized as printable.

Factor into this mess the fact that upstream coreutils still lacks
decent multi-byte handling in a lot of utilities.  Various distros have
add-on patches for better wchar_t handling, but as of yet they have not
been consolidated into something that is easily maintainable and adds no
overhead to the single-byte C locale situation.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]