Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store

From:	bibo mao
Subject:	Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store
Date:	Mon, 4 Sep 2023 17:43:23 +0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0


在 2023/9/4 09:43, gaosong 写道:
> Hi, yijun
> 
> 在 2023/9/3 上午9:10, Jiajie Chen 写道:
>>
>> On 2023/9/3 09:06, Richard Henderson wrote:
>>> On 9/1/23 22:02, Jiajie Chen wrote:
>>>> If LSX is available, use LSX instructions to implement 128-bit load &
>>>> store.
>>>
>>> Is this really guaranteed to be an atomic 128-bit operation?
>>>
>>
>> Song Gao, please check this.
>>
>>
> Could you explain this issue?  Thanks.
If address is aligned with 16-bytes, the 128-bit load/store is atomic.
Else it is not atomic since maybe it crosses two cache lines or pages.

Regards
Bibo Mao
> 
>>> Or, as for many vector processors, is this really two separate 64-bit 
>>> memory operations under the hood?
>>>
>>>
>>>> +static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg 
>>>> data_hi,
>>>> +                                   TCGReg addr_reg, MemOpIdx oi, bool 
>>>> is_ld)
>>>> +{
>>>> +    TCGLabelQemuLdst *ldst;
>>>> +    HostAddress h;
>>>> +
>>>> +    ldst = prepare_host_addr(s, &h, addr_reg, oi, true);
>>>> +    if (is_ld) {
>>>> +        tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index);
>>>> +        tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0);
>>>> +        tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1);
>>>> +    } else {
>>>> +        tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0);
>>>> +        tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1);
>>>> +        tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index);
>>>> +    }
>>>
>>> You should use h.aa.atom < MO_128 to determine if 128-bit atomicity, and 
>>> therefore the vector operation, is required.  I assume the gr<->vr moves 
>>> have a cost and two integer operations are preferred when allowable.
>>>
>>> Compare the other implementations of this function.
>>>
>>>
>>> r~
>

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v3 09/16] tcg/loongarch64: Lower vector min max ops, (continued)
- [PATCH v3 09/16] tcg/loongarch64: Lower vector min max ops, Jiajie Chen, 2023/09/02
- [PATCH v3 11/16] tcg/loongarch64: Lower vector shift vector ops, Jiajie Chen, 2023/09/02
- [PATCH v3 12/16] tcg/loongarch64: Lower bitsel_vec to vbitsel, Jiajie Chen, 2023/09/02
- [PATCH v3 13/16] tcg/loongarch64: Lower vector shift integer ops, Jiajie Chen, 2023/09/02
- [PATCH v3 14/16] tcg/loongarch64: Lower rotv_vec ops to LSX, Jiajie Chen, 2023/09/02
- [PATCH v3 15/16] tcg/loongarch64: Lower rotli_vec to vrotri, Jiajie Chen, 2023/09/02
- [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store, Jiajie Chen, 2023/09/02
  - Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store, Richard Henderson, 2023/09/02
    - Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store, Jiajie Chen, 2023/09/02
    - Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store, gaosong, 2023/09/03
    - Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store, bibo mao <=

Prev by Date: Re: [PATCH v2 3/3] gdbstub: replace exit(0) with proper shutdown
Next by Date: Re: [PATCH v2 2/3] hw/char: riscv_htif: replace exit(0) with proper shutdown
Previous by thread: Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store
Next by thread: [PATCH v2] qdict: Preserve order for iterating qdict elements
Index(es):
- Date
- Thread