qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 19/45] target/arm: Implement SME MOVA


From: Peter Maydell
Subject: Re: [PATCH v4 19/45] target/arm: Implement SME MOVA
Date: Mon, 4 Jul 2022 10:31:53 +0100

On Mon, 4 Jul 2022 at 10:08, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 7/1/22 21:49, Peter Maydell wrote:
> > On Tue, 28 Jun 2022 at 05:40, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> We can reuse the SVE functions for implementing moves to/from
> >> horizontal tile slices, but we need new ones for moves to/from
> >> vertical tile slices.
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


> >> +void HELPER(sme_mova_cz_q)(void *za, void *vn, void *vg, uint32_t desc)
> >> +{
> >> +    int i, oprsz = simd_oprsz(desc) / 16;
> >> +    uint16_t *pg = vg;
> >> +    Int128 *n = vn;
> >> +    Int128 *a = za;
> >> +
> >> +    for (i = 0; i < oprsz; i++) {
> >> +        if (pg[H2(i)] & 1) {
> >> +            a[i * sizeof(ARMVectorReg)] = n[i];
> >
> > Is it really OK to do this with an Int128 store? That is
> > in host-order, but the two halves of a 128-bit quantity
> > in the ZA array are in architectural order. I suppose the
> > source also will have them in the architectural order, but
> > it does look odd, especially uncommented.
>
> Would memcpy be better for you?

I guess that means we end up doing it all as byte-pointer
arithmetic, which might look worse. I think with a comment
that the two halves of the Int128 might be swapped but this
is OK because we are only copying it will be fine.

> >> +    /* Resolve tile.size[index] to an untyped ZA slice index. */
> >> +    t_index = tcg_temp_new_i32();
> >> +    tcg_gen_trunc_tl_i32(t_index, cpu_reg(s, rs));
> >> +    tcg_gen_addi_i32(t_index, t_index, index);
> >
> > This code isn't doing what the comment says; it's just calculating
> > the (not-yet-taken-MOD-dim) slice index, which does depend on the type.
>
> I guess the comment applies to a larger section than just these two lines.
>
> >
> >> +
> >> +    len = ctz32(streaming_vec_reg_size(s)) - esz;
> >
> > What is this the length of ?
>
> The length of the extract, aka the mod.
>
> >> +        /* The tile slice offset within env->zarray is the column offset. 
> >> */
> >> +        offset = tile;
> >
> > I don't understand why we can just add the tile index
> > (which is going to be an integer 0, 1, 2..) to a byte offset.
> > In the horizontal case we add tile * sizeof(ARMVectorReg),
> > which makes more sense to me.
>
> Hmm.  I think you're right this should be tile * column width, or
>
>    offset = tile << esz;
>
> I wish I could compare vs FVP...

> >> +        /*
> >> +         * For big-endian, adjust the column slice offset within
> >> +         * the uint64_t host words that make up env->zarray.
> >> +         * This must wait until index and offset are combined.
> >
> > Why? Neither the byte-offset of the start of the tile nor
> > the byte offset of zarray in CPUARMState ought to be non-8-aligned.
>
> Columns will not be 8-aligned.  On page 38 of 0616A.a, see the illustration 
> of ZA0V.B[22],
> which is 6 mod 8.

Yes, but the column slice number isn't part of offset, it's
in index, so (contra the comment) you could do the xor before
the "add offset to index" if you wanted (ie it doesn't matter
which order we do these things).

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]