[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#9780: sort -u throws out non-duplicates
From: |
Eric Blake |
Subject: |
bug#9780: sort -u throws out non-duplicates |
Date: |
Mon, 17 Oct 2011 20:22:52 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110928 Fedora/3.1.15-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.4 Thunderbird/3.1.15 |
tag 9780 moreinfo
thanks
On 10/17/2011 06:59 PM, Bernhard Rosenkraenzer wrote:
address@hidden tmp]$ wget http://bero.eu/java-source-list
[...]
address@hidden tmp]$ tr ' ' '\n' <java-source-list |sort |grep
X509Certificate
libcore/luni/src/main/java/java/security/cert/X509Certificate.java
libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
This is correct...
address@hidden tmp]$ tr ' ' '\n' <java-source-list |sort -u |grep
X509Certificate
libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
Note the missing .../java/java/security/cert/X509Certificate.java
Thanks for the report. Unfortunately, you did not provide enough
information to reproduce this - for example, what platform are you
running on? Can you narrow it down to a single file of say 5 or so
lines? Can you reproduce the problem with shorter input lines?
My guess, although I need more info to confirm it, is that this is not a
bug, but rather that java-source-list contains some lines that differ in
case and/or punctuation but happen to collate identically. If so, then
sort -u is picking the lower-case version as the unique line, at which
point your grep for the case-sensitive X509Certificate is obviously failing.
The fact that you already proved that LC_ALL=C changes the behavior
lends credence to my supposition, since C is byte-sensitive, but most
other languages collate case-insensitively. See also the FAQ:
https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
The problem occurs (at least) with sort from coreutils 8.12, 8.13 and 8.14.
Use 'sort --debug' to help decipher sort's behavior. Here's my
demonstration that I cannot reproduce it using coreutils.git with just
two input lines:
$ printf
'libcore/luni/src/main/java/java/security/cert/X509Certificate.java\nlibcore/luni/src/main/java/javax/security/cert/X509Certificate.java\n'
| sort -u --debug
sort: using `en_US.UTF-8' sorting rules
libcore/luni/src/main/java/java/security/cert/X509Certificate.java
__________________________________________________________________
libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
___________________________________________________________________
So there's definitely something else in java-source-list that we aren't
seeing that is (probably correctly) affecting your output.
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org