bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9780: sort -u throws out non-duplicates


From: Eric Blake
Subject: bug#9780: sort -u throws out non-duplicates
Date: Mon, 17 Oct 2011 20:22:52 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110928 Fedora/3.1.15-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.4 Thunderbird/3.1.15

tag 9780 moreinfo
thanks

On 10/17/2011 06:59 PM, Bernhard Rosenkraenzer wrote:
address@hidden tmp]$ wget http://bero.eu/java-source-list
[...]
address@hidden tmp]$ tr ' ' '\n' <java-source-list |sort |grep
X509Certificate
libcore/luni/src/main/java/java/security/cert/X509Certificate.java
libcore/luni/src/main/java/javax/security/cert/X509Certificate.java

This is correct...

address@hidden tmp]$ tr ' ' '\n' <java-source-list |sort -u |grep
X509Certificate
libcore/luni/src/main/java/javax/security/cert/X509Certificate.java

Note the missing .../java/java/security/cert/X509Certificate.java

Thanks for the report. Unfortunately, you did not provide enough information to reproduce this - for example, what platform are you running on? Can you narrow it down to a single file of say 5 or so lines? Can you reproduce the problem with shorter input lines?

My guess, although I need more info to confirm it, is that this is not a bug, but rather that java-source-list contains some lines that differ in case and/or punctuation but happen to collate identically. If so, then sort -u is picking the lower-case version as the unique line, at which point your grep for the case-sensitive X509Certificate is obviously failing.

The fact that you already proved that LC_ALL=C changes the behavior lends credence to my supposition, since C is byte-sensitive, but most other languages collate case-insensitively. See also the FAQ:

https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

The problem occurs (at least) with sort from coreutils 8.12, 8.13 and 8.14.

Use 'sort --debug' to help decipher sort's behavior. Here's my demonstration that I cannot reproduce it using coreutils.git with just two input lines:

$ printf 'libcore/luni/src/main/java/java/security/cert/X509Certificate.java\nlibcore/luni/src/main/java/javax/security/cert/X509Certificate.java\n' | sort -u --debug
sort: using `en_US.UTF-8' sorting rules
libcore/luni/src/main/java/java/security/cert/X509Certificate.java
__________________________________________________________________
libcore/luni/src/main/java/javax/security/cert/X509Certificate.java
___________________________________________________________________

So there's definitely something else in java-source-list that we aren't seeing that is (probably correctly) affecting your output.

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org





reply via email to

[Prev in Thread] Current Thread [Next in Thread]