qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1


From: Richard Henderson
Subject: Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1
Date: Tue, 5 Jul 2022 07:19:39 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

On 7/4/22 16:09, Peter Maydell wrote:
+static void clear_vertical_h(void *vptr, size_t off, size_t len)
+{
+    uint16_t *ptr = vptr;
+    size_t i;
+
+    for (i = 0; i < len / 2; ++i) {
+        ptr[(i + off) * sizeof(ARMVectorReg)] = 0;
+    }

The code for the bigger-than-byte vertical actions seems wrong:
because 'ptr' is a uint16_t here this expression is mixing
byte offsets (off, the multiply by sizeof(ARMVectorReg) with
halfword offsets (i, the fact we're calculating an index value
for a uint16_t array).

I agree these are wrong, because they mix 'i' as index and 'off' as byte offset. I think the correct addressing is always the same as byte addressing. I.e.

    for (i = 0; i < len; i += N) {
        uintN_t *ptr = vptr + (i + off) * sizeof(ARMVectorReg);
        *ptr = 0;
    }

so that every iteration advances N rows and writes N bytes.

+static void copy_vertical_h(void *vdst, const void *vsrc, size_t len)
+{
+    const uint16_t *src = vsrc;
+    uint16_t *dst = vdst;
+    size_t i;
+
+    for (i = 0; i < len / 2; ++i) {
+        dst[i * sizeof(ARMVectorReg)] = src[i];

Similarly the array index calculation for dst[] here looks wrong.

I don't think so in this case.  I'm not mixing indexes and byte offsets like I 
was above.

Recall that the next vertical tile element is not in the next physical row, but in the Nth physical row. Therefore there are always sizeof(ARMVectorReg) elements in the host layout between vertical tile elements.

I agree it looks strange.

+#define DO_LD(NAME, TYPE, HOST, TLB)                                        \
+static inline void sme_##NAME##_v_host(void *za, intptr_t off, void *host)  \
+{                                                                           \
+    TYPE val = HOST(host);                                                  \
+    *(TYPE *)(za + off * sizeof(ARMVectorReg)) = val;                       \
+}                                                                           \
+static inline void sme_##NAME##_v_tlb(CPUARMState *env, void *za,           \
+                        intptr_t off, target_ulong addr, uintptr_t ra)      \
+{                                                                           \
+    TYPE val = TLB(env, useronly_clean_ptr(addr), ra);                      \
+    *(TYPE *)(za + off * sizeof(ARMVectorReg)) = val;                       \
+}

So in these functions is 'za' pre-adjusted to the start address of the
vertical column?

Yes.  That's true of all of these routines, and what I compute in 
get_tile_colrow.

Is 'off' a byte offset here

Yes.

(in which case the arithmetic is wrong for anything except byte columns)

I don't think so in this case. This is all byte arithmetic. Just as for copy_vertical_*, there are N rows between elements.

Consider a vertical tile slice of uint64_t:

  Element 0 is off=0 -> za + row 0.
  Element 1 is off=8 -> za + row 8.

+    tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz);
+    tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));

Theoretically we ought to call gen_check_sp_alignment() here
for rn == 31, but I guess we didn't do that for any of the
non-base-A64 instructions like SVE either.

Oh yeah.  Some day we should make gen_check_sp_alignment do something too.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]