Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1

From:	Richard Henderson
Subject:	Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1
Date:	Tue, 5 Jul 2022 16:51:44 +0530
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

On 7/5/22 16:18, Peter Maydell wrote:

Ah yes, I see how this works. I wonder if there's some way we
can abstract out this sort of index calculation into a macro
or function so that we can comment what it's doing there and
then all the use-sites are more "obviously correct". Perhaps:

/*
  * When considering the ZA storage as an array of elements of
  * type T, the index within that array of the Nth element of
  * a vertical slice of a tile can be calculated like this,
  * regardless of the size of type T. This is because the tiles
  * are interleaved, so if type T is size N bytes then row 1 of
  * the tile is N rows away from row 0. The division by N to
  * convert a byte offset into an array index and the multiplication
  * by N to convert from vslice-index-within-the-tile to
  * the index within the ZA storage cancel out.
  */
#define tile_vslice_index(i) ((i) * sizeof(ARMVectorReg))

/*
  * When doing byte arithmetic on the ZA storage, the element
  * byteoff bytes away in a tile vertical slice is always this
  * many bytes away in the ZA storage, regardless of the
  * size of the tile element, assuming that byteoff is a multiple
  * of the element size. Again this is because of the interleaving
  * of the tiles. For instance if we have 1 byte per element then
  * each row of the ZA storage has one byte of the vslice data,
  * and (counting from 0) byte 8 goes in row 8 of the storage
  * at offset (8 * row-size-in-bytes).
  * If we have 8 bytes per element then each row of the ZA storage
  * has 8 bytes of the data, but there are 8 interleaved tiles and
  * so byte 8 of the data goes into row 1 of the tile,
  * which is again row 8 of the storage, so the offset is still
  * (8 * row-size-in-bytes). Similarly for other element sizes.
  */
#define tile_vslice_offset(byteoff) ((byteoff) * sizeof(ARMVectorReg))

(or use functions if you like. Maybe we want versions that
take (row,col) arguments too.)


That seems reasonable.  I'll work this into v5.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1, Peter Maydell, 2022/07/04
- Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1, Richard Henderson, 2022/07/04
  - Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1, Peter Maydell, 2022/07/05
    - Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1, Richard Henderson <=

Prev by Date: [PULL 6/6] hw/intc/loongarch_ipi: Fix mail send and any send function
Next by Date: Re: [RFC 0/8] Introduce an extensible static analyzer
Previous by thread: Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1
Next by thread: Re: [PATCH v4 23/45] target/arm: Implement SME ADDHA, ADDVA
Index(es):
- Date
- Thread