Re: efficient version of 'sort | uniq -c

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: efficient version of 'sort | uniq -c | sort -n'?

From:	Philip Rowlands
Subject:	Re: efficient version of 'sort \| uniq -c \| sort -n'?
Date:	Mon, 21 May 2007 21:48:13 +0100 (BST)

On Mon, 21 May 2007, Matthew Woehlke wrote:

I thought about that, but /maximum/ efficiency is only achievabledoing everything in one go. Anyway I think 'countitems' would still bea big improvement; I would do that as 'sort --unique-with-count'(preferably aliased 'sort -U') since IMO this is a missing feature of'sort -u'.

You don't really want to do the first sort at all - it's just aconvenient way of creating the buckets. The relative order of eachbucket is unimportant, but that's what sort spends a long timecalculating.


A fundamentally more efficient approach would be something like:

perl -lne '$bucket{$_}++; END { foreach $key (keys %bucket) { print "$bucket{$key} 
$key" } }' | \
  sort -n

The trailing "sort" could be done inside perl, but it doesn't help the(algorithmic) efficiency, and we're not playing perl golf...



Cheers,
Phil

[Prev in Thread]

Current Thread

[Next in Thread]

efficient version of 'sort | uniq -c | sort -n'?, Matthew Woehlke, 2007/05/21
- Re: efficient version of 'sort | uniq -c | sort -n'?, James Youngman, 2007/05/21
  - Re: efficient version of 'sort | uniq -c | sort -n'?, Matthew Woehlke, 2007/05/21
    - Re: efficient version of 'sort | uniq -c | sort -n'?, Philip Rowlands <=
    - Re: efficient version of 'sort | uniq -c | sort -n'?, Matthew Woehlke, 2007/05/21
- Re: efficient version of 'sort | uniq -c | sort -n'?, Paul Eggert, 2007/05/21
  - Re: efficient version of 'sort | uniq -c | sort -n'?, Matthew Woehlke, 2007/05/21

Prev by Date: Re: efficient version of 'sort | uniq -c | sort -n'?
Next by Date: Re: efficient version of 'sort | uniq -c | sort -n'?
Previous by thread: Re: efficient version of 'sort | uniq -c | sort -n'?
Next by thread: Re: efficient version of 'sort | uniq -c | sort -n'?
Index(es):
- Date
- Thread