[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#23665: spaces in keys: doc, --debug in LC_ALL=C
From: |
Assaf Gordon |
Subject: |
bug#23665: spaces in keys: doc, --debug in LC_ALL=C |
Date: |
Tue, 31 May 2016 15:11:10 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 |
Hello Karl!
On 05/31/2016 02:32 PM, Karl Berry wrote:
I run
LC_ALL=en_US.UTF-8 sort --debug -k 2 /tmp/foo # or -k 2,2 et al.
And get the nicely explanatory output for the "surprising" result:
[...]
Just to verify, the surprising result is in C locale?
I'm seeing the following, for "en_US.UTF-8" it's the order I'd expect, but the
"C" is surprising:
$ cat -A k.txt
M Build/zfile$
M Master/mfile$
MM Build/afile$
$ LC_ALL=en_US.UTF-8 sort -k2 k.txt
MM Build/afile
M Build/zfile
M Master/mfile
$ LC_ALL=C sort -k2 k.txt
M Build/zfile
M Master/mfile
MM Build/afile
But the information is just as valid in C as in UTF-8, so far as I can
see. Thus it would be nice for it to be present.
If I understand correctly, one could argue the warning is even more important
in C locale than in UTF-8 locales,
as collating rules for UTF-8 make leading spaces less significant.
As in:
$ cat -A s.txt
M A$
M B$
M D$
M C$
UTF-8 makes leading spaces less important:
$ LC_ALL=en_US.UTF-8 sort -k2 s.txt
M A
M B
M C
M D
in C locale, spaces (as simple bytes) do matter:
$ LC_ALL=C sort -k2 s.txt
M D
M B
M C
M A
-b skips leading spaces:
$ LC_ALL=C sort -k2b s.txt
M A
M B
M C
M D
More importantly, I urge that the documentation for sort give an example
of this. The idea that following blanks after the first become part of
the next field is highly counter-intuitive.
I agree,
I can add the above example to the documentation (also possibly to the FAQ or
Gotcha pages?).
What do you think?
The condition to print this message is here:
http://lingrok.org/xref/coreutils/src/sort.c#2435
I can try to suggest a patch to print it in C locale as well (hopefully
tonight).
It would also be nice if the definition of "key 1" was stated.
Awfully easy to misread that as "field 1".
How about "leading blanks are significant in sort key [...]" ?
(in http://lingrok.org/xref/coreutils/src/sort.c#2439 )
regards,
- assaf
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Karl Berry, 2016/05/31
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C,
Assaf Gordon <=
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Pádraig Brady, 2016/05/31
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Karl Berry, 2016/05/31
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Paul Eggert, 2016/05/31
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Assaf Gordon, 2016/05/31
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Pádraig Brady, 2016/05/31
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Assaf Gordon, 2016/05/31
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Paul Eggert, 2016/05/31