The intrinsics guide is a nice find, I dug a bit deeper into the Intel®
Architecture Instruction Set Extensions and Future Features Programming
Reference [1] from March 2018 and it shows the 4 variants:
VEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
CPUID feature flag: VPCLMULQDQ
EVEX.NDS.128.66.0F3A.WIG 44 /r /ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
CPUID feature flag: AVX512VL, VPCLMULQDQ
EVEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
CPUID feature flag: AVX512VL, VPCLMULQDQ
EVEX.NDS.512.66.0F3A.WIG 44 /r /ib VPCLMULQDQ zmm1, zmm2, zmm3/m512, imm8
CPUID feature flag: AVX512F, VPCLMULQDQ
So the VPCLMULQDQ opcode needs AVX512VL and VPCLMULQDQ to be encoded with
the EVEX prefix (and use xmm/ymm), or AVX512F and VPCLMULQDQ to use zmm,
but only VPCLMULQDQ to be encoded with the VEX prefix for avx256. The build
flags for the cksum_avx2 object are `-mpclmul -mavx -mavx2 -mvpclmulqdq` so
the lack of any avx512 support should ensure it compiles to VEX and not
EVEX.