[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Sort order bug in GNU sort
From: |
Luke Hutchison |
Subject: |
Re: Sort order bug in GNU sort |
Date: |
Thu, 29 Oct 2009 20:43:39 -0400 |
Hi Pádraig,
As stated, "The following is the output of GNU sort (without any
switches)" -- i.e. I used the defaults, and did not specify any
commandline switches. If as you say, by default the whole line is the
sort key, and if default sorting is lexicographic order, how are the
following snippets from the sorted output possibly correct?
sampleId-1010,0.0625
sampleId-101,0.0625
sampleId-1010,1.0
sampleId-980,1.0
sampleId-98,1.0
sampleId-981,0.0625
sampleId-990,1.0
sampleId-99,1.0
sampleId-991,0.25
Based on ASCII encoding (',' < '0' < '1'), I believe these should be:
sampleId-101,0.0625
sampleId-1010,0.0625
sampleId-1010,1.0
sampleId-98,1.0
sampleId-980,1.0
sampleId-981,0.0625
sampleId-99,1.0
sampleId-990,1.0
sampleId-991,0.25
Even if in some weird locale, ',' > '0', or some other weird thing
were true, the two lines "sampleId-1010,0.0625" and
"sampleId-1010,1.0" should be grouped together either before or after
"sampleId-101,0.0625", because they share a common prefix
"sampleId-1010" -- but they are separated. Similarly,
"sampleId-990,1.0" and "sampleId-991,0.25" absolutely should not be
separated by "sampleId-99,1.0", because there is no way in any locale
that '0' < ',' < '1'.
I was led to think that sorting happened field-wise (not line-wise) by
default by the man page, which says, "-t , --field-separator=SEP : use
SEP instead of non-blank to blank transition". It would be helpful to
explicitly add to the description of "-k" that "If no key is given,
the whole line is used as the key".
Thanks,
Luke
2009/10/29 Pádraig Brady <address@hidden>
>
> Luke Hutchison wrote:
> > Hi,
> >
> > The following is the output of GNU sort (without any switches) on an
> > unsorted file. Numerous errors (of the same variety) seem present in the
> > ordering. I am using coreutils-7.2-4.fc11.x86_64. Problems are shown in
> > red.
>
> You need to specify the sort command you used.
> Does this sort your data correctly?
>
> sort -t, -k1,1V
>
> > Additionally, there probably needs to be a switch added to sort that uses
> > the entire line as the sort key,
>
> It does that by default
>
> > not blank-to-non-blank transition
>
> Note also the 'b' option.
>
> cheers,
> Pádraig.
- Sort order bug in GNU sort, Luke Hutchison, 2009/10/29
- Re: Sort order bug in GNU sort, Pádraig Brady, 2009/10/29
- Re: Sort order bug in GNU sort,
Luke Hutchison <=
- Re: Sort order bug in GNU sort, Eric Blake, 2009/10/29
- Re: Sort order bug in GNU sort, Luke Hutchison, 2009/10/29
- Re: Sort order bug in GNU sort, Eric Blake, 2009/10/29
- Re: Sort order bug in GNU sort, Luke Hutchison, 2009/10/29
- Re: Sort order bug in GNU sort, Luke Hutchison, 2009/10/29
- Re: Sort order bug in GNU sort, Bob Proulx, 2009/10/30
Re: Sort order bug in GNU sort, Bob Proulx, 2009/10/29