[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#19142: sort not working with LANG set to language_country.encoding
From: |
Bob Proulx |
Subject: |
bug#19142: sort not working with LANG set to language_country.encoding |
Date: |
Fri, 21 Nov 2014 22:49:41 -0700 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
tag 19142 notabug
close 19142
thanks
Roland Sieker wrote:
> I have noticed that sort seems to have problems when the LANG environment
> variable is set with language and country.
Sort is definitely affected by LANG because LANG sets LC_COLLATE which
controls the collation sequence. Different locales have different
collating sequences. I don't like that the english locales such as my
own country's en_US.UTF-8 and others like en_GB.UTF-8 don't sort
"correctly" as far as I am concerned but I can only accept it. Sort
order is actually a libc function and affects much more than sort. It
also affects ls and the shell and basically everything on the system
that sorts.
> It sorts OK like this, with LANG just the language.encoding:
> ( setenv LANG en.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort )
> a
> a
> b
Are you sure "en.UTF-8" is a valid locale? It doesn't look like it to
me. I think that is an invalid locale and therefore libc is falling
back to the C/POSIX locale.
> But not with LANG as language_country.encoding:
> ( setenv LANG en_GB.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort )
Here "en_GB.UTF-8" is a valid domain and en_GB.UTF-8 uses dictionary
sort ordering. Dictionary order folds case and ignores punctuation.
Try using the newish sort --debug option. It will help debug problems
such as this.
$ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en_US.UTF-8 sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
...
$ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en.UTF-8 sort --debug
sort: using simple byte comparison
...
See also the FAQ entry:
https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
Bob