coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup


From: Jeffrey Walton
Subject: Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup
Date: Mon, 25 Nov 2024 17:41:29 -0500

On Mon, Nov 25, 2024 at 5:31 PM Sam Russell <sam.h.russell@gmail.com> wrote:
>
> Results thanks to Jeff
>
> srussell@icelake:~$ time ./cksum_bench_pclmul 1048575 10000
> Hash: 5B9DA0F4, length: 1048575
>
> real    0m3.561s
> user    0m3.535s
> sys     0m0.026s
> srussell@icelake:~$ time ./cksum_bench_avx2 1048575 10000
> Hash: 5B9DA0F4, length: 1048575
>
> real    0m2.083s
> user    0m2.047s
> sys     0m0.036s
> srussell@icelake:~$ time ./cksum_bench_avx512 1048575 10000
> Hash: 5B9DA0F4, length: 1048575
>
> real    0m1.353s
> user    0m1.320s
> sys     0m0.033s
>
> Zero code change in the algorithm so we're effectively testing whether I've 
> calculated the constants correctly and whether I'm loading the previous CRC 
> into the correct part of the AVX register.
>
> Attached patch has Pádraig's feedback plus the new runtime check that will 
> enable the AVX2 version if avx512f is specified but the avx512_supported() 
> check has failed (because vpclmulqdq isn't set). I would appreciate if anyone 
> has a definitive answer on the correct way to test for avx2+vpclmulqdq vs 
> avx512+vpclmulqdq, and whether any chip exists that supports a subset avx512 
> but also doesn't support vpclmulqdq on avx2...

I don't believe you will encounter avx2+vpclmulqdq. According to the
Intel Intrinsic Guide,[1] vpclmulqdq is AVX512. If you have AVX512,
then AVX2 is a proper subset available to you. (You won't find AVX2
plus a few AVX512 features. That combination will not show up on AVX2
machines, like Skylake or Kaby Lake).

According to the Intel Intrinsic Guide,[1] you should check for
VPCLMULQDQ+AVX512VL _if_ you are using vpclmulqdq ymm, ymm, ymm, imm8
form. You should check for VPCLMULQDQ alone _if_ you are using the
vpclmulqdq zmm, zmm, zmm, imm8 form.

[1] 
<https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vpclmulqdq>.

Jeff



reply via email to

[Prev in Thread] Current Thread [Next in Thread]