[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup
From: |
Sam Russell |
Subject: |
Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup |
Date: |
Mon, 25 Nov 2024 23:31:22 +0100 |
Results thanks to Jeff
srussell@icelake:~$ time ./cksum_bench_pclmul 1048575 10000
Hash: 5B9DA0F4, length: 1048575
real 0m3.561s
user 0m3.535s
sys 0m0.026s
srussell@icelake:~$ time ./cksum_bench_avx2 1048575 10000
Hash: 5B9DA0F4, length: 1048575
real 0m2.083s
user 0m2.047s
sys 0m0.036s
srussell@icelake:~$ time ./cksum_bench_avx512 1048575 10000
Hash: 5B9DA0F4, length: 1048575
real 0m1.353s
user 0m1.320s
sys 0m0.033s
Zero code change in the algorithm so we're effectively testing whether I've
calculated the constants correctly and whether I'm loading the previous CRC
into the correct part of the AVX register.
Attached patch has Pádraig's feedback plus the new runtime check that will
enable the AVX2 version if avx512f is specified but the avx512_supported()
check has failed (because vpclmulqdq isn't set). I would appreciate if
anyone has a definitive answer on the correct way to test for
avx2+vpclmulqdq vs avx512+vpclmulqdq, and whether any chip exists that
supports a subset avx512 but also doesn't support vpclmulqdq on avx2...
On Mon, 25 Nov 2024 at 19:29, Sam Russell <sam.h.russell@gmail.com> wrote:
> Thanks, sent key off-list
>
> I also think I've been confusing myself, the benchmark program doesn't
> check the flags. I think I will need to change the logic though, here's the
> lscpu from my Radeon with AVX2
>
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Address sizes: 48 bits physical, 48 bits virtual
> Byte Order: Little Endian
> CPU(s): 12
> On-line CPU(s) list: 0-11
> Vendor ID: AuthenticAMD
> Model name: AMD Ryzen 5 5600 6-Core Processor
> CPU family: 25
> Model: 33
> Thread(s) per core: 2
> Core(s) per socket: 6
> Socket(s): 1
> Stepping: 2
> BogoMIPS: 6986.86
> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
> fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_
> good nopl tsc_reliable nonstop_tsc cpuid
> extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes
> xsave avx f16c rdrand hypervisor lahf_lm cmp_legac
> y cr8_legacy abm sse4a misalignsse 3dnowprefetch
> osvw topoext ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms
> rdseed adx smap clflushopt clwb sha_ni xsav
> eopt xsavec xgetbv1 xsaves clzero xsaveerptr arat
> umip vaes vpclmulqdq rdpid fsrm
>
> So it does set vpclmulqdq but doesn't set avx512. Jeff's CPU has both
> avx512f and vpclmulqdq, and the skylake on EC2 has avx512f but does NOT
> have vpclmulqdq. This might mean that we'll want AVX2 on any AVX2 processor
> with vpclmulqdq, and any AVX512 processor that does NOT have vpclmulqdq
> set, does that seem logical?
>
0001-cksum-Use-AVX2-and-AVX512-for-speedup.patch
Description: Binary data
- [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Jeffrey Walton, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup,
Sam Russell <=
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Jeffrey Walton, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Michael Stone, 2024/11/27