coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] wc: speed-up by simplifying avx code


From: Pádraig Brady
Subject: Re: [PATCH] wc: speed-up by simplifying avx code
Date: Sun, 31 Mar 2024 13:12:31 +0100
User-agent: Mozilla Thunderbird

On 31/03/2024 00:18, Evgeny Nizhibitsky wrote:
Here is the proposed patch for both simplifying and consistently speeding up 
the avx version of wc -l by 10% in up to 1 billion rows scenarios on 7800X3D 
(probably should be tested on different data samples and CPUs).

The patch was mangled, but I manually applied it.
Probably best to attach rather than pasting any further patches.
Attaching here in case others want to try.

This is good as it simplifies the code,
and should have the same portability, to machines and compilers.
I'll adjust the configure.ac check to be more aligned.

As for performance, I tested on my laptop with no change:

  # on an i7-5600U with 1 billion short lines
  $ yes | head -n1000000000 > /dev/shm/yes

  $ time src/wc-old -l /dev/shm/yes
  1000000000 /dev/shm/yes
  real    0m0.351s
  user    0m0.060s
  sys     0m0.288s

  $ time src/wc-new -l /dev/shm/yes
  1000000000 /dev/shm/yes
  real    0m0.356s
  user    0m0.098s
  sys     0m0.255s

Since you change the I/O size from 16 to 256 KiB,
it's more aligned with the recent I/O size adjustment in:
https://github.com/coreutils/coreutils/commit/fcfba90d0
In fact perhaps much of the speedup is just from that change.
Can you test on your system with the buffer reduced back to 16KiB
to see how much that impacts the performance?

thanks,
Pádraig

Attachment: wc-popcount.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]