[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store
From: |
bibo mao |
Subject: |
Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store |
Date: |
Mon, 4 Sep 2023 17:43:23 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 |
在 2023/9/4 09:43, gaosong 写道:
> Hi, yijun
>
> 在 2023/9/3 上午9:10, Jiajie Chen 写道:
>>
>> On 2023/9/3 09:06, Richard Henderson wrote:
>>> On 9/1/23 22:02, Jiajie Chen wrote:
>>>> If LSX is available, use LSX instructions to implement 128-bit load &
>>>> store.
>>>
>>> Is this really guaranteed to be an atomic 128-bit operation?
>>>
>>
>> Song Gao, please check this.
>>
>>
> Could you explain this issue? Thanks.
If address is aligned with 16-bytes, the 128-bit load/store is atomic.
Else it is not atomic since maybe it crosses two cache lines or pages.
Regards
Bibo Mao
>
>>> Or, as for many vector processors, is this really two separate 64-bit
>>> memory operations under the hood?
>>>
>>>
>>>> +static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg
>>>> data_hi,
>>>> + TCGReg addr_reg, MemOpIdx oi, bool
>>>> is_ld)
>>>> +{
>>>> + TCGLabelQemuLdst *ldst;
>>>> + HostAddress h;
>>>> +
>>>> + ldst = prepare_host_addr(s, &h, addr_reg, oi, true);
>>>> + if (is_ld) {
>>>> + tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index);
>>>> + tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0);
>>>> + tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1);
>>>> + } else {
>>>> + tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0);
>>>> + tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1);
>>>> + tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index);
>>>> + }
>>>
>>> You should use h.aa.atom < MO_128 to determine if 128-bit atomicity, and
>>> therefore the vector operation, is required. I assume the gr<->vr moves
>>> have a cost and two integer operations are preferred when allowable.
>>>
>>> Compare the other implementations of this function.
>>>
>>>
>>> r~
>
- [PATCH v3 09/16] tcg/loongarch64: Lower vector min max ops, (continued)
- [PATCH v3 09/16] tcg/loongarch64: Lower vector min max ops, Jiajie Chen, 2023/09/02
- [PATCH v3 11/16] tcg/loongarch64: Lower vector shift vector ops, Jiajie Chen, 2023/09/02
- [PATCH v3 12/16] tcg/loongarch64: Lower bitsel_vec to vbitsel, Jiajie Chen, 2023/09/02
- [PATCH v3 13/16] tcg/loongarch64: Lower vector shift integer ops, Jiajie Chen, 2023/09/02
- [PATCH v3 14/16] tcg/loongarch64: Lower rotv_vec ops to LSX, Jiajie Chen, 2023/09/02
- [PATCH v3 15/16] tcg/loongarch64: Lower rotli_vec to vrotri, Jiajie Chen, 2023/09/02
- [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store, Jiajie Chen, 2023/09/02