qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 03/14] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt


From: Jiajie Chen
Subject: Re: [PATCH v2 03/14] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
Date: Sat, 2 Sep 2023 01:28:20 +0800
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.15.0


On 2023/9/2 01:24, Richard Henderson wrote:
On 9/1/23 02:30, Jiajie Chen wrote:
Signed-off-by: Jiajie Chen <c@jia.je>
---
  tcg/loongarch64/tcg-target-con-set.h |  1 +
  tcg/loongarch64/tcg-target.c.inc     | 60 ++++++++++++++++++++++++++++
  2 files changed, 61 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>



diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h
index 37b3f80bf9..d04916db25 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ)
  C_O1_I2(r, rZ, ri)
  C_O1_I2(r, rZ, rJ)
  C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, wJ)

Notes for improvement: 'J' is a signed 32-bit immediate.


I was wondering about the behavior of 'J' on i128 types: in tcg_target_const_match(), the argument type is int, so will the higher bits be truncated?

Besides, tcg_target_const_match() does not know the vector element width.



+        if (const_args[2]) {
+            /*
+             * cmp_vec dest, src, value
+             * Try vseqi/vslei/vslti
+             */
+            int64_t value = sextract64(a2, 0, 8 << vece);
+            if ((cond == TCG_COND_EQ || cond == TCG_COND_LE || \
+                 cond == TCG_COND_LT) && (-0x10 <= value && value <= 0x0f)) { +                tcg_out32(s, encode_vdvjsk5_insn(cmp_vec_imm_insn[cond][vece], \
+                                                 a0, a1, value));
+                break;
+            } else if ((cond == TCG_COND_LEU || cond == TCG_COND_LTU) &&
+                (0x00 <= value && value <= 0x1f)) {
+                tcg_out32(s, encode_vdvjuk5_insn(cmp_vec_imm_insn[cond][vece], \
+                                                 a0, a1, value));

Better would be a new constraint that only matches

    -0x10 <= x <= 0x1f

If the sign is wrong for the comparison, it can *always* be loaded with just vldi.

Whereas at present, using J,

+            tcg_out_dupi_vec(s, type, vece, temp_vec, a2);
+            a2 = temp_vec;

this may require 3 instructions (lu12i.w + ori + vreplgr2vr).

By constraining the constants allowed, you allow the register allocator to see that a register is required, which may be reused for another instruction.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]