Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1

From:	Peter Maydell
Subject:	Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1
Date:	Tue, 5 Jul 2022 11:48:59 +0100

On Tue, 5 Jul 2022 at 02:49, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 7/4/22 16:09, Peter Maydell wrote:
> >> +static void copy_vertical_h(void *vdst, const void *vsrc, size_t len)
> >> +{
> >> +    const uint16_t *src = vsrc;
> >> +    uint16_t *dst = vdst;
> >> +    size_t i;
> >> +
> >> +    for (i = 0; i < len / 2; ++i) {
> >> +        dst[i * sizeof(ARMVectorReg)] = src[i];
> >
> > Similarly the array index calculation for dst[] here looks wrong.
>
> I don't think so in this case.  I'm not mixing indexes and byte offsets like 
> I was above.
>
> Recall that the next vertical tile element is not in the next physical row, 
> but in the Nth
> physical row.  Therefore there are always sizeof(ARMVectorReg) elements in 
> the host layout
> between vertical tile elements.
>
> I agree it looks strange.

Ah yes, I see how this works. I wonder if there's some way we
can abstract out this sort of index calculation into a macro
or function so that we can comment what it's doing there and
then all the use-sites are more "obviously correct". Perhaps:

/*
 * When considering the ZA storage as an array of elements of
 * type T, the index within that array of the Nth element of
 * a vertical slice of a tile can be calculated like this,
 * regardless of the size of type T. This is because the tiles
 * are interleaved, so if type T is size N bytes then row 1 of
 * the tile is N rows away from row 0. The division by N to
 * convert a byte offset into an array index and the multiplication
 * by N to convert from vslice-index-within-the-tile to
 * the index within the ZA storage cancel out.
 */
#define tile_vslice_index(i) ((i) * sizeof(ARMVectorReg))

/*
 * When doing byte arithmetic on the ZA storage, the element
 * byteoff bytes away in a tile vertical slice is always this
 * many bytes away in the ZA storage, regardless of the
 * size of the tile element, assuming that byteoff is a multiple
 * of the element size. Again this is because of the interleaving
 * of the tiles. For instance if we have 1 byte per element then
 * each row of the ZA storage has one byte of the vslice data,
 * and (counting from 0) byte 8 goes in row 8 of the storage
 * at offset (8 * row-size-in-bytes).
 * If we have 8 bytes per element then each row of the ZA storage
 * has 8 bytes of the data, but there are 8 interleaved tiles and
 * so byte 8 of the data goes into row 1 of the tile,
 * which is again row 8 of the storage, so the offset is still
 * (8 * row-size-in-bytes). Similarly for other element sizes.
 */
#define tile_vslice_offset(byteoff) ((byteoff) * sizeof(ARMVectorReg))

(or use functions if you like. Maybe we want versions that
take (row,col) arguments too.)

Even though we're just multiplying by sizeof(ARMVectorReg)
in both cases it feels to me like having the calculation behind a
function or macro named to indicate whether we're doing byte or
index arithmetic should help make the callsites clearer, and the
reader then only has to convince themselves about the implementations
once.

thanks
-- PMM

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1, Peter Maydell, 2022/07/04
- Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1, Richard Henderson, 2022/07/04
  - Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1, Peter Maydell <=
    - Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1, Richard Henderson, 2022/07/05

Prev by Date: [PULL 11/14] disas: Remove libvixl disassembler
Next by Date: Re: [PATCH v8 06/20] job.h: define functions called without job lock held
Previous by thread: Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1
Next by thread: Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1
Index(es):
- Date
- Thread