coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup


From: Pádraig Brady
Subject: Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup
Date: Mon, 25 Nov 2024 17:37:42 +0000
User-agent: Mozilla Thunderbird Beta

On 25/11/2024 16:04, Sam Russell wrote:
I've added a sample benchmarking program to measure the difference without
hitting disk, looking like a 40% speedup

$ time ./cksum_bench_pclmul 1048576 10000
Hash: EFA0B24F, length: 1048576

real    0m3.018s
user    0m3.018s
sys     0m0.000s

$ time ./cksum_bench_avx2 1048576 10000
Hash: EFA0B24F, length: 1048576

real    0m1.824s
user    0m1.804s
sys     0m0.020s

Impressive. What CPU was that exactly.

The code effectively replicates the existing pclmul code and has new
constants generated for the larger folds. The main gotcha was that the
previous CRC gets inserted at a weird offset due to endianness and byte
swapping.

There is a copy/paste issue:

diff --git a/src/cksum.c b/src/cksum.c
index 65424fe88..3eab1fbd4 100644
--- a/src/cksum.c
+++ b/src/cksum.c
@@ -186,8 +186,8 @@ avx512_supported (void)
   if (cksum_debug)
     error (0, 0, "%s",
            (avx512_enabled
-            ? _("using avx2 hardware support")
-            : _("avx2 support not detected")));
+            ? _("using avx512 hardware support")
+            : _("avx512 support not detected")));

   return avx512_enabled;
 }


Also `make syntax-check` indicates some lines are > 80 chars.

This improvement should be added to NEWS.

I don't have a skylake processor so I spun up an AWS instance to test out
the AVX512 version, it turns out there's a bug where virtualisation
environments don't handle the  AVX512   pclmul correctly despite the CPU
supporting it. It might be worth us disabling this for now as it does get
past the __builtin_cpu_supports() gate but then throws an illegal
instruction halfway through the function. It would be nice if we could at
least validate it for now though.

AVX2 has been around over 10 years though so this seems to be a safer
addition.

Yes, we'd have to leave the avx512 code disabled by default
if we couldn't find a way around this issue.
It's a surprising issue TBH.
What compiler version are you using?
Can you show the output of `grep flags /proc/cpuinfo | head -n1` on the VM.
There was a gcc bug in this area, but that was a while ago.
Unlikely, but maybe it resurfaced with avx512?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85100

thanks!
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]