[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup
From: |
Jeffrey Walton |
Subject: |
Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup |
Date: |
Mon, 25 Nov 2024 13:17:00 -0500 |
On Mon, Nov 25, 2024 at 11:09 AM Sam Russell <sam.h.russell@gmail.com> wrote:
>
> I've added a sample benchmarking program to measure the difference without
> hitting disk, looking like a 40% speedup
>
> $ time ./cksum_bench_pclmul 1048576 10000
> Hash: EFA0B24F, length: 1048576
>
> real 0m3.018s
> user 0m3.018s
> sys 0m0.000s
>
> $ time ./cksum_bench_avx2 1048576 10000
> Hash: EFA0B24F, length: 1048576
>
> real 0m1.824s
> user 0m1.804s
> sys 0m0.020s
>
> The code effectively replicates the existing pclmul code and has new
> constants generated for the larger folds. The main gotcha was that the
> previous CRC gets inserted at a weird offset due to endianness and byte
> swapping.
>
> I don't have a skylake processor so I spun up an AWS instance to test out
> the AVX512 version, it turns out there's a bug where virtualisation
> environments don't handle the AVX512 pclmul correctly despite the CPU
> supporting it.
Skylake has AVX and AVX2; not AVX512.
I can provide you remote access to an Icelake machine with AVX512.
Email me your authorized_keys file, and I'll send you the login
information.
For completeness, here are Icelake's feature flags:
$ cat /proc/cpuinfo | fold -w 72 -s
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 126
model name : Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz
stepping : 5
...
cpuid level : 27
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni
pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr
pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd
ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad
fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid avx512f
avx512dq rdseed adx smap avx512ifma clflushopt intel_pt avx512cd sha_ni
avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves split_lock_detect
dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes
vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid sgx_lc fsrm
md_clear flush_l1d arch_capabilities
> It might be worth us disabling this for now as it does get
> past the __builtin_cpu_supports() gate but then throws an illegal
> instruction halfway through the function. It would be nice if we could at
> least validate it for now though.
>
> AVX2 has been around over 10 years though so this seems to be a safer
> addition.
Jeff
- [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup,
Jeffrey Walton <=
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Jeffrey Walton, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/26