[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup
From: |
Pádraig Brady |
Subject: |
Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup |
Date: |
Mon, 25 Nov 2024 17:37:42 +0000 |
User-agent: |
Mozilla Thunderbird Beta |
On 25/11/2024 16:04, Sam Russell wrote:
I've added a sample benchmarking program to measure the difference without
hitting disk, looking like a 40% speedup
$ time ./cksum_bench_pclmul 1048576 10000
Hash: EFA0B24F, length: 1048576
real 0m3.018s
user 0m3.018s
sys 0m0.000s
$ time ./cksum_bench_avx2 1048576 10000
Hash: EFA0B24F, length: 1048576
real 0m1.824s
user 0m1.804s
sys 0m0.020s
Impressive. What CPU was that exactly.
The code effectively replicates the existing pclmul code and has new
constants generated for the larger folds. The main gotcha was that the
previous CRC gets inserted at a weird offset due to endianness and byte
swapping.
There is a copy/paste issue:
diff --git a/src/cksum.c b/src/cksum.c
index 65424fe88..3eab1fbd4 100644
--- a/src/cksum.c
+++ b/src/cksum.c
@@ -186,8 +186,8 @@ avx512_supported (void)
if (cksum_debug)
error (0, 0, "%s",
(avx512_enabled
- ? _("using avx2 hardware support")
- : _("avx2 support not detected")));
+ ? _("using avx512 hardware support")
+ : _("avx512 support not detected")));
return avx512_enabled;
}
Also `make syntax-check` indicates some lines are > 80 chars.
This improvement should be added to NEWS.
I don't have a skylake processor so I spun up an AWS instance to test out
the AVX512 version, it turns out there's a bug where virtualisation
environments don't handle the AVX512 pclmul correctly despite the CPU
supporting it. It might be worth us disabling this for now as it does get
past the __builtin_cpu_supports() gate but then throws an illegal
instruction halfway through the function. It would be nice if we could at
least validate it for now though.
AVX2 has been around over 10 years though so this seems to be a safer
addition.
Yes, we'd have to leave the avx512 code disabled by default
if we couldn't find a way around this issue.
It's a surprising issue TBH.
What compiler version are you using?
Can you show the output of `grep flags /proc/cpuinfo | head -n1` on the VM.
There was a gcc bug in this area, but that was a while ago.
Unlikely, but maybe it resurfaced with avx512?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85100
thanks!
Pádraig
- [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup,
Pádraig Brady <=
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Jeffrey Walton, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Jeffrey Walton, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/26