[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug#6327: sort fails on some UTF-8 input
From: |
Eric Blake |
Subject: |
Re: bug#6327: sort fails on some UTF-8 input |
Date: |
Wed, 02 Jun 2010 14:08:57 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-3.fc13 Lightning/1.0b2pre Mnenhy/0.8.2 Thunderbird/3.0.4 |
[redirecting to bug-gnulib]
On 06/02/2010 01:37 PM, Paul Eggert wrote:
> On 06/01/2010 09:51 PM, River Tarnell wrote:
>> I'm using coreutils 8.5 on Solaris 10.
>>
>> GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
>> correctly:
>
> Amusingly enough, on that same test case I found the same problem
> with GNU 'sort' that you did, but I also found that Solaris 'sort'
> reports that it runs out of memory, even in 64-bit mode. For example:
>
> 1010-kiwi $ LC_ALL=en_CA.UTF-8 /usr/bin/sparcv9/sort sort_test.txt
> sort: insufficient memory; use -S option to increase allocation
> 1011-kiwi $ LC_ALL=en_CA.UTF-8 coreutils-8.5/src/sort sort_test.txt
> coreutils-8.5/src/sort: string comparison failed: Illegal byte sequence
> coreutils-8.5/src/sort: Set LC_ALL='C' to work around the problem.
> coreutils-8.5/src/sort: The strings compared were
> `\360\222\203\276\360\222\205\226' and
> `\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.
>
> I expect that the exact failure mode probably depends on the
> locale (and/or whether you're using x86 or sparc),
> and that GNU 'sort' checks for strcoll failures but
> Solaris 'sort' does not (and thus crashes). If my guess is right,
> this appears to be a bug in the Solaris strcoll implementation.
> I don't see a simple workaround. You might file a bug report
> with Sun.
And in the meantime, now that we've confirmed that it is a Solaris
strcoll() bug, it would be nice to code a gnulib workaround.
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature