Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector

From:	Paolo Savini
Subject:	Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.
Date:	Mon, 11 Nov 2024 16:04:35 +0000
User-agent:	Mozilla Thunderbird

Hi Richard, Daniel,

This might be a silly question, but why do we need to ensure atomicitywhen emulating these guest instructions? I might be wrong but I didn'tsee an explicit requirement for the vector instructions to be atomic inthe documentation of the RISC-V V extension.

Anyway the patches from Max have landed and since one of them alreadyuses memcpy() where this patch does and achieves a similar performanceimprovement we should probably drop this particular patch. I'm wonderingwhether we should be concerned about atomicity there too?


https://github.com/qemu/qemu/blob/134b443512825bed401b6e141447b8cdc22d2efe/target/riscv/vector_helper.c#L224

Thanks

Paolo

On 11/8/24 09:11, Richard Henderson wrote:

On 11/7/24 12:58, Daniel Henrique Barboza wrote:
On 11/4/24 9:48 AM, Richard Henderson wrote:
On 10/30/24 15:25, Paolo Savini wrote:
On 10/30/24 11:40, Richard Henderson wrote:
    __builtin_memcpy DOES NOT equal VMOVDQA
I am aware of this. I took __builtin_memcpy as a generic enough wayto emulate loads and stores that should allow several hosts togenerate the widest load/store instructions they can and on x86 Isee this generates instructions vmovdpu/movdqu that are not alwaysguaranteed to be atomic. x86 though guarantees them to be atomic ifthe memory address is aligned to 16 bytes.
No, AMD guarantees MOVDQU is atomic if aligned, Intel does not.
See the comment in util/cpuinfo-i386.c, and the twoCPUINFO_ATOMIC_VMOVDQ[AU] bits.
See also host/include/*/host/atomic128-ldst.h, HAVE_ATOMIC128_RO,and atomic16_read_ro.Not that I think you should use that here; it's complicated, and Ithink you're better off relying on the code in accel/tcg/ when morethan byte atomicity is required.
Not sure if that's what you meant but I didn't find any clear example of
multi-byte atomicity using qatomic_read() and friends that would becloserto what memcpy() is doing here. I found one example inbdrv_graph_co_rdlock()
that seems to use a mem barrier via smp_mb() and qatomic_read() inside a
loop, but I don't understand that code enough to say.
Memory barriers provide ordering between loads and stores, but theycannot be used to address atomicity of individual loads and stores.
I'm also wondering if a common pthread_lock() wrapping up thesememcpy() callswould suffice in this case. Even if we can't guarantee that__builtin_memcpy()will use arch specific vector insns in the host it would already be afaster
path than falling back to fn(...).
Locks would certainly not be faster than calling the accel/tcg function.
In a quick detour, I'm not sure if we really considered how ARM SVEimplements these
helpers. E.g gen_sve_str():
https://gitlab.com/qemu-project/qemu/-/blob/master/target/arm/tcg/translate-sve.c#L4182
Note that ARM SVE defines these instructions to have byte atomicity.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data., Richard Henderson, 2024/11/04
- Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data., Daniel Henrique Barboza, 2024/11/07
  - Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data., Richard Henderson, 2024/11/08
    - Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data., Paolo Savini <=
    - Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data., Richard Henderson, 2024/11/14

Prev by Date: [PATCH v2 8/8] hw: enforce use of static, const string with qdev_new()
Next by Date: Re: [PATCH] block: Fix leak in send_qmp_error_event
Previous by thread: Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.
Next by thread: Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.
Index(es):
- Date
- Thread