Re: Counting words, fast!

help-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Counting words, fast!

From:	Leonid Isaev (ifax)
Subject:	Re: Counting words, fast!
Date:	Tue, 16 Mar 2021 22:12:50 +0000

On Tue, Mar 16, 2021 at 04:31:59PM -0500, Jesse Hathaway wrote:
> Ben Hoyt wrote a fun blog post on counting words,
> https://benhoyt.com/writings/count-words, I wrote a Bash version with
> as many optimizations as I could find[1], I would love any suggestions
> on any substantial optimizations, the code is attached. The full
> repo and test file is here: https://github.com/benhoyt/countwords

Nice, but I don't understand why did you choose to implement insertion sort
in bash? Replacing it with GNU sort(1) significantly cuts down execution time:
-----8<-----
[lisaev@hyena tmp]$ cat a.sh 
#!/bin/bash

set -o errexit
set -o nounset

declare -Ai freqs
declare -a line
declare w

while read -ra line; do
        for w in "${line[@]}"; do
                freqs["${w,,}"]+=1
        done
done

for w in "${!freqs[@]}"; do
        printf "%s %s\n" "${freqs[$w]}" "$w"
done

[lisaev@hyena tmp]$ bash a.sh < kjvbible.txt | sort -rnk1 | head
64015 the
51313 and
34634 of
13567 to
12784 that
12503 in
10261 he
9838 shall
8987 unto
8810 for

[lisaev@hyena tmp]$ time { bash a.sh < kjvbible.txt | sort -rnk1 &> /dev/null; 
} 
real    0m9.721s
user    0m8.874s
sys     0m0.813s
----->8-----
(notice, this test was done in a VM, on a file in tmpfs, so there is no caching
associated with filesystem access).

If using sort(1) destroys "pure-bash-ness" of the task, you should program a
quick-/merge-sort/...-sort subroutine with $O (n \ln n)$ performance.

Sincerely,
L.

--
Leonid Isaev

[Prev in Thread]

Current Thread

[Next in Thread]

Counting words, fast!, Jesse Hathaway, 2021/03/16
- Re: Counting words, fast!, Leonid Isaev (ifax) <=
  - Re: Counting words, fast!, Greg Wooledge, 2021/03/16
    - Re: Counting words, fast!, Leonid Isaev (ifax), 2021/03/16
  - Re: Counting words, fast!, Jesse Hathaway, 2021/03/16
    - Re: Counting words, fast!, Dennis Williamson, 2021/03/16
    - Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
    - Re: Counting words, fast!, Dennis Williamson, 2021/03/17
    - Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
    - Re: Counting words, fast!, Greg Wooledge, 2021/03/17
    - Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
- Re: Counting words, fast!, Koichi Murase, 2021/03/19

Prev by Date: Counting words, fast!
Next by Date: Re: Counting words, fast!
Previous by thread: Counting words, fast!
Next by thread: Re: Counting words, fast!
Index(es):
- Date
- Thread