[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Most used words in current buffer
From: |
Bob Proulx |
Subject: |
Re: Most used words in current buffer |
Date: |
Wed, 18 Jul 2018 18:45:36 -0600 |
User-agent: |
Mutt/1.10.0 (2018-05-17) |
Ben Bacarisse wrote:
> Udyant Wig writes:
> > they were left behind by this old Awk solution (also using hashing) I
Not wanting to be too annoying but I see no hashing in the awk
solution. It is using an awk associative array to store the words.
Perl and Pything call those "hashes" but they are just associative
arrays.
> > found in the classic /The Unix Programming Environment/ by Kernighan and
> > Pike:
> >...
> > awk ' { for (i = 1; i <= NF; i++) num[$i]++ }
> > END { for (word in num) print word, num[word] }
> > ' $* | sort +1 -nr | head -10 | awk '{ print $1 }'
> >
> > I appended the last awk pipeline to only give the words without the
> > counts.
>
> The Unix command cut does this task. Nothing wrong with using another
> awk, but I often feel sorry for poor old cut. It's been around for
> decades, and yet is so very often overlooked! Mind you, it uses TABs to
> delimit fields by default, so maybe it only has itself to blame.
I will continue to be contrary here and say that awk does a much
better job of cutting by whitespace separated fields than does cut.
Both are standard and should be available everywhere. And here
because awk is already in use I expect it to be somewhat more
efficient to use awk again in the pipeline than to use a different
program.
I also wish to improve the command line somewhat. Using $* by itself
does not sufficiently quote program arguments with whitespace. One
should use "$@" for that purpose. Also the old forms of sort and head
would be better left behind and use the new portable option set
for them instead. Let me suggest:
' "$@" | sort -k2,2nr | head -n10 | awk '{ print $1 }'
Bob
- Most used words in current buffer, Udyant Wig, 2018/07/17
- Re: Most used words in current buffer, Emanuel Berg, 2018/07/17
- Re: Most used words in current buffer, Ben Bacarisse, 2018/07/18
- Re: Most used words in current buffer,
Bob Proulx <=
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/19
- Re: Most used words in current buffer, Bob Proulx, 2018/07/19
- Re: Most used words in current buffer, tomas, 2018/07/19
- Re: Most used words in current buffer, Nick Dokos, 2018/07/19
- Re: Most used words in current buffer, Eli Zaretskii, 2018/07/19
- Re: Most used words in current buffer, Bob Proulx, 2018/07/19
- Re: Most used words in current buffer, Nick Dokos, 2018/07/20
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/20
- Re: Most used words in current buffer, Bob Newell, 2018/07/20
- Re: Most used words in current buffer, Nick Dokos, 2018/07/20