bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26422: historical feature or grand daddy bug?


From: Paul Eggert
Subject: bug#26422: historical feature or grand daddy bug?
Date: Sun, 9 Apr 2017 12:04:34 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

Historically, 'sort' ignored the \n at the end of each line, so that empty lines (i.e., lines consisting only of a single \n) collated before all other lines. An earlier version of the POSIX spec was (mis)written to require treating the \n as part of the data, and during development in 1999 GNU sort was briefly changed to conform to that, but this was an error in the POSIX spec that was eventually fixed and GNU sort was changed back to the traditional behavior, before any release was made with the funky behavior.

So, it's not a bug that \t\n collates after \n, since "\t" is lexicographically after "".

As I understand it, the empty string should collate before all other strings in all POSIX locales, so empty lines should always sort first in 'sort' output. I'm by no means a collation expert, though, and if I'm wrong I'd like to see a counterexample.

Come to think of it, 'sort' might be able to improve performance in the common case of sorting text files containing many empty lines, by merely counting the lines rather than storing them internally. I suppose this is a different topic, though.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]