Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

From:	Pádraig Brady
Subject:	Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup
Date:	Tue, 26 Nov 2024 00:10:34 +0000
User-agent:	Mozilla Thunderbird Beta

On 25/11/2024 23:27, Sam Russell wrote:

The intrinsics guide is a nice find, I dug a bit deeper into the Intel®
Architecture Instruction Set Extensions and Future Features Programming
Reference [1] from March 2018 and it shows the 4 variants:

VEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
CPUID feature flag:  VPCLMULQDQ

EVEX.NDS.128.66.0F3A.WIG 44 /r /ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
CPUID feature flag: AVX512VL, VPCLMULQDQ

EVEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
CPUID feature flag: AVX512VL, VPCLMULQDQ

EVEX.NDS.512.66.0F3A.WIG 44 /r /ib VPCLMULQDQ zmm1, zmm2, zmm3/m512, imm8
CPUID feature flag: AVX512F, VPCLMULQDQ

So the VPCLMULQDQ opcode needs AVX512VL and VPCLMULQDQ to be encoded with
the EVEX prefix (and use xmm/ymm), or AVX512F and VPCLMULQDQ to use zmm,
but only VPCLMULQDQ to be encoded with the VEX prefix for avx256. The build
flags for the cksum_avx2 object are `-mpclmul -mavx -mavx2 -mvpclmulqdq` so
the lack of any avx512 support should ensure it compiles to VEX and not
EVEX.


Thanks for all the investigation.
However I don't see any changes in CFLAGS or builtin_cpu_supports() checks
between the first and this patch. Am I missing something?

Also I was wondering how parameterizable the new code is.
I.e. would it be easy to parameterize to support -a crc32b?
From my previous notes on the gnulib list I summarized the differences as:

cksum -a crc parameters:
------------------------
Polynomial: 04C11DB7
Initial Value: 00000000
Final XOR Value: 00000000
Reverse data: no
Reverse crc (before xor): no

cksum -a crc32b (gnulib crc32) equivalent parameters:
------------------------
Polynomial: 04C11DB7
Initial Value: FFFFFFFF
Final XOR Value: FFFFFFFF
Reverse data: yes
Reverse crc (before xor): yes

cheers,
Pádraig

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/25
  - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Jeffrey Walton, 2024/11/25
  - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Jeffrey Walton, 2024/11/25
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/25
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady <=
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/26
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Pádraig Brady, 2024/11/26
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sam Russell, 2024/11/26
    - Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Michael Stone, 2024/11/27
- Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup, Sylvestre Ledru, 2024/11/25

Prev by Date: Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup
Next by Date: sort and collation
Previous by thread: Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup
Next by thread: Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup
Index(es):
- Date
- Thread