bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7182: sort -R slow


From: Jim Meyering
Subject: bug#7182: sort -R slow
Date: Sun, 07 Aug 2011 22:42:52 +0200

Davide Brini wrote:
> On Sat, 9 Oct 2010 14:52:41 +0200 Ole Tange <address@hidden> wrote:
>
>> I recently needed to randomize some lines. So I tried using 'sort -R'.
>> I was astonished how slow that was. So I tested how slow a competing
>> strategies are. GNU sort is two magnitudes slower than unsort and more
>> than one magnitude slower than perl:
>>
>> $ time unsort file
>> real    0m1.388s
>>
>> $ unsort --version
>> unsort 1.1.2
>>
>> $ time perl -e 'print sort { rand() <=> rand() } <>' file
>> real    0m6.621s
>>
>> $ time sort -R file
>> real    4m8.403s
>>
>> $ sort --version
>> sort (GNU coreutils) 8.5
>>
>> What is even scarier: sort without -R is faster than sort -R:
>>
>> $ time sort file
>> real    0m53.553s
>>
>> I would expect sort -R to be faster than sort and faster than Perl if
>> not as fast as unsort.
>
> On my system, locale settings seem to impact the runtime significantly:
>
> $ wc -l bigfile
> 1000000 bigfile
>
> $ time LC_ALL=en_US.utf8 sort -R bigfile > /dev/null
>
> real  1m29.302s
> user  1m21.009s
> sys   0m0.155s
>
> $ time LC_ALL=C sort -R bigfile > /dev/null
>
> real  0m38.881s
> user  0m35.276s
> sys   0m0.118s
>
>
> However, shuf is much faster, and seems mostly unaffected by the locale
> used:
>
> $ time shuf bigfile > /dev/null
>
> real  0m1.044s
> user  0m0.833s
> sys   0m0.042s

Thanks for the report.
I think the performance of sort -R will often be worse
than that of shuf (by design, since it accesses each byte of each line
once more, to compute the hash), except when the input size is larger
than available memory.

The info documentation for sort -R does refer to "shuf".

Any suggestions for improvements are welcome.
I'm closing this.

You're welcome to reopen or file a new report.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]