[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Counting words, fast!
From: |
Leonid Isaev (ifax) |
Subject: |
Re: Counting words, fast! |
Date: |
Tue, 16 Mar 2021 22:12:50 +0000 |
On Tue, Mar 16, 2021 at 04:31:59PM -0500, Jesse Hathaway wrote:
> Ben Hoyt wrote a fun blog post on counting words,
> https://benhoyt.com/writings/count-words, I wrote a Bash version with
> as many optimizations as I could find[1], I would love any suggestions
> on any substantial optimizations, the code is attached. The full
> repo and test file is here: https://github.com/benhoyt/countwords
Nice, but I don't understand why did you choose to implement insertion sort
in bash? Replacing it with GNU sort(1) significantly cuts down execution time:
-----8<-----
[lisaev@hyena tmp]$ cat a.sh
#!/bin/bash
set -o errexit
set -o nounset
declare -Ai freqs
declare -a line
declare w
while read -ra line; do
for w in "${line[@]}"; do
freqs["${w,,}"]+=1
done
done
for w in "${!freqs[@]}"; do
printf "%s %s\n" "${freqs[$w]}" "$w"
done
[lisaev@hyena tmp]$ bash a.sh < kjvbible.txt | sort -rnk1 | head
64015 the
51313 and
34634 of
13567 to
12784 that
12503 in
10261 he
9838 shall
8987 unto
8810 for
[lisaev@hyena tmp]$ time { bash a.sh < kjvbible.txt | sort -rnk1 &> /dev/null;
}
real 0m9.721s
user 0m8.874s
sys 0m0.813s
----->8-----
(notice, this test was done in a VM, on a file in tmpfs, so there is no caching
associated with filesystem access).
If using sort(1) destroys "pure-bash-ness" of the task, you should program a
quick-/merge-sort/...-sort subroutine with $O (n \ln n)$ performance.
Sincerely,
L.
--
Leonid Isaev
- Counting words, fast!, Jesse Hathaway, 2021/03/16
- Re: Counting words, fast!,
Leonid Isaev (ifax) <=
- Re: Counting words, fast!, Greg Wooledge, 2021/03/16
- Re: Counting words, fast!, Jesse Hathaway, 2021/03/16
- Re: Counting words, fast!, Dennis Williamson, 2021/03/16
- Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
- Re: Counting words, fast!, Dennis Williamson, 2021/03/17
- Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
- Re: Counting words, fast!, Greg Wooledge, 2021/03/17
- Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
Re: Counting words, fast!, Koichi Murase, 2021/03/19