bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort


From: Bob Proulx
Subject: Re: sort
Date: Mon, 29 Aug 2005 09:00:53 -0600
User-agent: Mutt/1.5.9i

Nathan Moore wrote:
> I do not believe that the default behavior for GNU sort is what the man 
> page and the info documents
> state.  Also, I have found no flags to force this behavior.

Thanks for sending in a report.  But in what way is it not the the
same as documented?  I read your message carefully but I don't see
where you said this.

> If I have a file "file.txt" containing:
> ...
> then shouldn't
> sort file.txt
> sort the lines by the ascii values of the first characters on each line, 
> yielding output
> where all duplicated symbols are on adjacent lines?

No, it should based upon your current locale setting.  If your locale
setting is ascii then it will sort by ascii.  But if your locale
setting is dictionary order then it will sort by dictionary ordering.

The sort program is actually using strcoll(3) for this operation.  You
can read the low level documentation on it for more details.

  man strcoll

> This is not what happens.  Am I wrong in gathering that this is the 
> expected behavior
> from the documentation or is this a bug?

The man page (the coreutils man pages are a terse quick reference form
only, the full documentation is in the info pages) for sort says this:

  man sort

       ***  WARNING  ***  The locale specified by the environment affects sort
       order.  Set LC_ALL=C to get the traditional sort order that uses native
       byte values.

The info pages for sort say this:

  info coreutils sort

       (1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
    `en_US'), then `sort' may produce output that is sorted differently
    than you're accustomed to.  In that case, set the `LC_ALL' environment
    variable to `C'.  Note that setting only `LC_COLLATE' has two problems.
    First, it is ineffective if `LC_ALL' is also set.  Second, it has
    undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset) is
    set to an incompatible value.  For example, you get undefined behavior
    if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'.

Here is the frequently given answer.

  http://www.gnu.org/software/coreutils/faq/

Look for "Sort does not sort in normal order!"

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]