bug#23665: spaces in keys: doc, --debug in LC

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23665: spaces in keys: doc, --debug in LC_ALL=C

From:	Pádraig Brady
Subject:	bug#23665: spaces in keys: doc, --debug in LC_ALL=C
Date:	Tue, 31 May 2016 23:46:47 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 31/05/16 20:11, Assaf Gordon wrote:

Hello Karl!

On 05/31/2016 02:32 PM, Karl Berry wrote:

I run
    LC_ALL=en_US.UTF-8 sort --debug -k 2 /tmp/foo  # or -k 2,2 et al.
And get the nicely explanatory output for the "surprising" result:

[...]

Just to verify, the surprising result is in C locale?

I'm seeing the following, for "en_US.UTF-8" it's the order I'd expect, but the 
"C" is surprising:

      $ cat -A k.txt
      M  Build/zfile$
      M  Master/mfile$
      MM Build/afile$

      $ LC_ALL=en_US.UTF-8 sort -k2 k.txt
      MM Build/afile
      M  Build/zfile
      M  Master/mfile

      $ LC_ALL=C sort -k2 k.txt
      M  Build/zfile
      M  Master/mfile
      MM Build/afile

But the information is just as valid in C as in UTF-8, so far as I can
see.  Thus it would be nice for it to be present.


If I understand correctly, one could argue the warning is even more important 
in C locale than in UTF-8 locales,
as collating rules for UTF-8 make leading spaces less significant.

As in:

      $ cat -A s.txt
      M A$
      M  B$
      M   D$
      M  C$

UTF-8 makes leading spaces less important:

      $ LC_ALL=en_US.UTF-8 sort -k2 s.txt
      M A
      M  B
      M  C
      M   D

in C locale, spaces (as simple bytes) do matter:

      $ LC_ALL=C sort -k2 s.txt
      M   D
      M  B
      M  C
      M A

-b skips leading spaces:

      $ LC_ALL=C sort -k2b s.txt
      M A
      M  B
      M  C
      M   D

More importantly, I urge that the documentation for sort give an example
of this.  The idea that following blanks after the first become part of
the next field is highly counter-intuitive.


I agree,
I can add the above example to the documentation (also possibly to the FAQ or 
Gotcha pages?).
What do you think?

The condition to print this message is here:
   http://lingrok.org/xref/coreutils/src/sort.c#2435
I can try to suggest a patch to print it in C locale as well (hopefully 
tonight).


The warning was suppressed in this case as one might be using
such a command to sort right aligned indexes:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v8.5-40-g63761c0
Now I was probably over thinking that a bit,
so I'd be happy for the removal of the maybe_space_aligned from the condition.

cheers,
Pádraig.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Karl Berry, 2016/05/31
- bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Assaf Gordon, 2016/05/31
  - bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Pádraig Brady <=
  - bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Karl Berry, 2016/05/31
    - bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Paul Eggert, 2016/05/31
    - bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Assaf Gordon, 2016/05/31
    - bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Pádraig Brady, 2016/05/31
    - bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Assaf Gordon, 2016/05/31
    - bug#23665: spaces in keys: doc, --debug in LC_ALL=C, Paul Eggert, 2016/05/31

Prev by Date: bug#23664: Date Problem: Wrong return using last month today (2016-05-31)
Next by Date: bug#23665: spaces in keys: doc, --debug in LC_ALL=C
Previous by thread: bug#23665: spaces in keys: doc, --debug in LC_ALL=C
Next by thread: bug#23665: spaces in keys: doc, --debug in LC_ALL=C
Index(es):
- Date
- Thread