coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] cksum: Use AVX2 and AVX512 for speedup


From: Sam Russell
Subject: [PATCH] cksum: Use AVX2 and AVX512 for speedup
Date: Mon, 25 Nov 2024 17:04:20 +0100

I've added a sample benchmarking program to measure the difference without
hitting disk, looking like a 40% speedup

$ time ./cksum_bench_pclmul 1048576 10000
Hash: EFA0B24F, length: 1048576

real    0m3.018s
user    0m3.018s
sys     0m0.000s

$ time ./cksum_bench_avx2 1048576 10000
Hash: EFA0B24F, length: 1048576

real    0m1.824s
user    0m1.804s
sys     0m0.020s

The code effectively replicates the existing pclmul code and has new
constants generated for the larger folds. The main gotcha was that the
previous CRC gets inserted at a weird offset due to endianness and byte
swapping.

I don't have a skylake processor so I spun up an AWS instance to test out
the AVX512 version, it turns out there's a bug where virtualisation
environments don't handle the  AVX512   pclmul correctly despite the CPU
supporting it. It might be worth us disabling this for now as it does get
past the __builtin_cpu_supports() gate but then throws an illegal
instruction halfway through the function. It would be nice if we could at
least validate it for now though.

AVX2 has been around over 10 years though so this seems to be a safer
addition.

Attachment: cksum_bench.c
Description: Text document

Attachment: 0001-cksum-Use-AVX2-and-AVX512-for-speedup.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]