[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#17189: Sort bug #2
From: |
Nikos Balkanas |
Subject: |
bug#17189: Sort bug #2 |
Date: |
Mon, 7 Apr 2014 21:11:13 +0300 |
On Mon, Apr 7, 2014 at 3:49 PM, Eric Blake <address@hidden> wrote:
> On 04/05/2014 01:19 PM, Nikos Balkanas wrote:
>
> >>
> >> No, earlier distributions merely defaulted to LC_ALL=C instead of
> >> LC_ALL=en_US.UTF-8. This complaint is the same as your previous one,
> >> and the solution is the same - if you want sorting by bytes, then ensure
> >> that your locale is set to C rather than en_US.UTF-8.
> >>
> >> Thank you all. As I explained in my previous mail, an update of the man
> > pages is essential. A change in the UI would also be desirable,
> > if the standards allow it. Sorry, about my attitude, but I was getting
> > pretty desperate. Thanks for not flaming.
> >
> > To make it up I will look into updating the man pages ;-)
>
> But the man page ALREADY says this:
>
> *** WARNING *** The locale specified by the environment
> affects sort
> order. Set LC_ALL=C to get the traditional sort order that uses
> native
> byte values.
>
> What more are you proposing?
>
I have already written a patch. It uses the available "-a" command line
option to
"force" traditional (ascii) sorting. Have updated man pages accordingly.
What is the best way to upload it?
>
> >
> > A suggestion. I think that sort should sort text based on the LOCALE of
> > the file, not the system. Couldn't it detect automatically from the text,
> > whether it is is dealing with UTF-8 or iso?
>
> Unfortunately, no, this is not possible. You're welcome to try and
> write a patch to prove me wrong, but people have already had years of
> experience of using environment variables as the way to tell a program
> what encoding an input file uses, precisely because there is no other
> obvious way of determining a file's locale.
>
> It is possible. It's been sometime, since I was parsing unicode, but if I
remember correctly,
a unicode char sets bits in its data to specify continuation. This calls
for adaptive sorting based on input.
I think Bob already mentioned, that it is not acceptable to do a second
pass on the input (worst case scenario)
to determine input locale, however, adaptive sorting should not need to.
Unfortunately it is considerable
effort and I would need to know your sorting algo. Since I don't and have
much work to do this period,
I wrote the much easier ui patch I talked before.
I find it more elegant and easier than changing the environment. If it is
acceptable, let me know how to upload it.
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>
- bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/05
- bug#17189: Sort bug #2, Eric Blake, 2014/04/05
- bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/05
- bug#17189: Sort bug #2, Eric Blake, 2014/04/07
- bug#17189: Sort bug #2,
Nikos Balkanas <=
- bug#17189: Sort bug #2, Eric Blake, 2014/04/07
- bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/07
- bug#17189: Sort bug #2, Eric Blake, 2014/04/07
- bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/09
- bug#17189: Sort bug #2, Eric Blake, 2014/04/09
- bug#17189: Sort bug #2, Leslie S Satenstein, 2014/04/07
- bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/08