|
From: | Richard Henderson |
Subject: | Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store |
Date: | Sat, 2 Sep 2023 18:06:53 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 |
On 9/1/23 22:02, Jiajie Chen wrote:
If LSX is available, use LSX instructions to implement 128-bit load & store.
Is this really guaranteed to be an atomic 128-bit operation?Or, as for many vector processors, is this really two separate 64-bit memory operations under the hood?
+static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg data_hi, + TCGReg addr_reg, MemOpIdx oi, bool is_ld) +{ + TCGLabelQemuLdst *ldst; + HostAddress h; + + ldst = prepare_host_addr(s, &h, addr_reg, oi, true); + if (is_ld) { + tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index); + tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0); + tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1); + } else { + tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0); + tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1); + tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index); + }
You should use h.aa.atom < MO_128 to determine if 128-bit atomicity, and therefore the vector operation, is required. I assume the gr<->vr moves have a cost and two integer operations are preferred when allowable.
Compare the other implementations of this function. r~
[Prev in Thread] | Current Thread | [Next in Thread] |