[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Counting words, fast!
From: |
Jesse Hathaway |
Subject: |
Re: Counting words, fast! |
Date: |
Wed, 17 Mar 2021 10:34:47 -0500 |
On Tue, Mar 16, 2021 at 10:30 PM Dennis Williamson
<dennistwilliamson@gmail.com> wrote:
> I've been playing with your optimized code changing the read to grab data in
> chunks like some of the other optimized code does - thus extending your move
> from by-word to by-line reading to reading a specified larger number of
> characters.
>
> IFS= read -r -N 4096 var
>
> And appending the result of a regular read to end at a newline. This seemed
> to cut about 20% off the time. But I get different counts than your code.
> I've tried using read without specifying a variable and using the resulting
> $REPLY to preserve whitespace but the counts still didn't match.
>
> In any case this points to larger chunks being more efficient.
Oh! That is a clever idea, I wanted to try reading in larger chunks, but
I wasn't sure how to ensure I had read whole words until you gave
this idea. Using 64K chunks I was able to shave off about 7s in my
testing:
declare -iA words_to_freq
eof='false'
set -o noglob
while [[ "${eof}" == 'false' ]]; do
if ! LANG='C' IFS='' read -N 65536 -r block; then
eof='true'
fi
if ! IFS='' read -r line; then
eof='true'
fi
for word in ${block@L}${line@L}; do
words_to_freq["${word}"]+=1
done
done
set +o noglob
- Counting words, fast!, Jesse Hathaway, 2021/03/16
- Re: Counting words, fast!, Leonid Isaev (ifax), 2021/03/16
- Re: Counting words, fast!, Greg Wooledge, 2021/03/16
- Re: Counting words, fast!, Jesse Hathaway, 2021/03/16
- Re: Counting words, fast!, Dennis Williamson, 2021/03/16
- Re: Counting words, fast!,
Jesse Hathaway <=
- Re: Counting words, fast!, Dennis Williamson, 2021/03/17
- Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
- Re: Counting words, fast!, Greg Wooledge, 2021/03/17
- Re: Counting words, fast!, Jesse Hathaway, 2021/03/17
Re: Counting words, fast!, Koichi Murase, 2021/03/19