[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: efficient version of 'sort | uniq -c | sort -n'?
From: |
Philip Rowlands |
Subject: |
Re: efficient version of 'sort | uniq -c | sort -n'? |
Date: |
Mon, 21 May 2007 21:48:13 +0100 (BST) |
On Mon, 21 May 2007, Matthew Woehlke wrote:
I thought about that, but /maximum/ efficiency is only achievable
doing everything in one go. Anyway I think 'countitems' would still be
a big improvement; I would do that as 'sort --unique-with-count'
(preferably aliased 'sort -U') since IMO this is a missing feature of
'sort -u'.
You don't really want to do the first sort at all - it's just a
convenient way of creating the buckets. The relative order of each
bucket is unimportant, but that's what sort spends a long time
calculating.
A fundamentally more efficient approach would be something like:
perl -lne '$bucket{$_}++; END { foreach $key (keys %bucket) { print "$bucket{$key}
$key" } }' | \
sort -n
The trailing "sort" could be done inside perl, but it doesn't help the
(algorithmic) efficiency, and we're not playing perl golf...
Cheers,
Phil