[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v6 02/10] util/bufferiszero: Remove AVX512 variant
From: |
Daniel P . Berrangé |
Subject: |
Re: [PATCH v6 02/10] util/bufferiszero: Remove AVX512 variant |
Date: |
Mon, 29 Apr 2024 12:16:56 +0100 |
User-agent: |
Mutt/2.2.12 (2023-09-09) |
On Wed, Apr 24, 2024 at 03:56:57PM -0700, Richard Henderson wrote:
> From: Alexander Monakov <amonakov@ispras.ru>
>
> Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD
> routines are invoked much more rarely in normal use when most buffers
> are non-zero. This makes use of AVX512 unprofitable, as it incurs extra
> frequency and voltage transition periods during which the CPU operates
> at reduced performance, as described in
> https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html
This is describing limitations of Intel's AVX512 implementation.
AMD's AVX512 implementation is said to not have the kind of
power / frequency limitations that Intel's does:
https://www.mersenneforum.org/showthread.php?p=614191
"Overall, AMD's AVX512 implementation beat my expectations.
I was expecting something similar to Zen1's "double-pumping"
of AVX with half the register file and cross-lane instructions
being super slow. But this is not the case on Zen4. The lack
of power or thermal issues combined with stellar shuffle support
makes it completely worthwhile to use from a developer standpoint.
If your code can vectorize without excessive wasted computation,
then go all the way to 512-bit. AMD not only made this worthwhile,
but *incentivizes* it with the power savings. And if in the future
AMD decides to widen things up, you may get a 2x speedup for free."
IOW, it sounds like we could be sacrificing performance on modern
AMD Genoa generation CPUs by removing the AVX512 impl
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
- [PATCH v6 00/10] Optimize buffer_is_zero, Richard Henderson, 2024/04/24
- [PATCH v6 01/10] util/bufferiszero: Remove SSE4.1 variant, Richard Henderson, 2024/04/24
- [PATCH v6 02/10] util/bufferiszero: Remove AVX512 variant, Richard Henderson, 2024/04/24
- Re: [PATCH v6 02/10] util/bufferiszero: Remove AVX512 variant,
Daniel P . Berrangé <=
- [PATCH v6 03/10] util/bufferiszero: Reorganize for early test for acceleration, Richard Henderson, 2024/04/24
- [PATCH v6 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants, Richard Henderson, 2024/04/24
- [PATCH v6 06/10] util/bufferiszero: Improve scalar variant, Richard Henderson, 2024/04/24
- [PATCH v6 04/10] util/bufferiszero: Remove useless prefetches, Richard Henderson, 2024/04/24
- [PATCH v6 09/10] util/bufferiszero: Add simd acceleration for aarch64, Richard Henderson, 2024/04/24