bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort --ignore-case option changes underscore sort position


From: John Wiersba
Subject: Re: sort --ignore-case option changes underscore sort position
Date: Fri, 22 Aug 2008 12:56:04 -0400

Thanks for the quick and very clear explanation, Bob!  I saw the
--ignore-case option definition, but the implications of it weren't
immediately apparent to me.  It was especially confusing because I was
comparing with the output of a different tool which folds to lowercase when
doing comparisons and couldn't understand why there was a difference.  Also,
the underscore character is particularly affected due to its heavy use in
filenames and program identifiers.

Maybe the documentation could be enhanced, something along the lines of:

The sort order of non-case-sensitive characters, such as punctuation, will
be affected if their sort order is different relative to lowercase and
uppercase characters.  For example, in the C locale, the underscore
character sorts in between uppercase characters and lowercase characters,
causing the strings m and _ to sort differently with and without the
--ignore-case option.

On Fri, Aug 22, 2008 at 1:27 AM, Bob Proulx <address@hidden> wrote:

> ...
>  `-f'
>  `--ignore-case'
>       Fold lowercase characters into the equivalent uppercase characters
>       when comparing so that, for example, `b' and `B' sort as equal.
>       The `LC_CTYPE' locale determines character types.
>
> Therefore your test case:
>
>  { echo a_; echo ax; } | sort --ignore-case
>
> Is really the same as:
>
>  $ { echo a_; echo ax; } | sort
>  a_
>  ax
>
>   $ { echo A_; echo AX; } | sort
>  AX
>  A_
>
>  $ { echo A_; echo AX; } | sort --ignore-case
>  AX
>  A_
>
> When using upper case you can see that it is equivalent to using the
> --ignore-case option.  Perhaps this should have been more accurately
> called --convert-to-upper-case-before-sorting.
>
> The surprising part might be realizing that underscore collates
> between the upper and lower case letters when using the C/POSIX
> standard sort ordering.  That is the standard legacy behavior.  It
> does this along with [ \ ] ^ _ ` which all occur between Z and a in
> the US-ASCII code table.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]