Re: [PATCH v1 1/2] s390x/tcg: Implement Vector-Enhancements Facility 2 f

> Too many changes in one patch.
> You need to split these into smaller, logical units.

Can you give some guideline on that?
IE: change to two, the shifts and reversed loads into two patches or more on line count of each patch?
.
> Tabs, and more later.

The tabs should not happen at all, I disabled them in editor will figure out how they've reappeared.

> This seems likely to go wrong for vdst == vsrc.

> In addition, swapping the order of elements is something that can be done in parallel.

There is always an even number of elements.

Will make the change there however, that code is more concise.

> Just use a little-endian load: MO_LE | es.

> While we use MO_TE all over, it's no secret that it's always big-endian.

> And everywhere else you do load then swap, or swap then store.

I wasn't sure if there was a reason MO_TE was used so just kept with the existing code flow.

Thanks

- David Miller

On Thu, Mar 3, 2022 at 3:58 AM Richard Henderson <richard.henderson@linaro.org> wrote:

On 3/2/22 17:22, David Miller wrote:
> resolves: https://gitlab.com/qemu-project/qemu/-/issues/738
>
> implements:
> VECTOR LOAD ELEMENTS REVERSED (VLER)
> VECTOR LOAD BYTE REVERSED ELEMENTS (VLBR)
> VECTOR LOAD BYTE REVERSED ELEMENT (VLEBRH, VLEBRF, VLEBRG)
> VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO (VLLEBRZ)
> VECTOR LOAD BYTE REVERSED ELEMENT AND REPLOCATE (VLBRREP)
> VECTOR STORE ELEMENTS REVERSED (VSTER)
> VECTOR STORE BYTE REVERSED ELEMENTS (VSTBR)
> VECTOR STORE BYTE REVERSED ELEMENTS (VSTEBRH, VSTEBRF, VSTEBRG)
> VECTOR SHIFT LEFT DOUBLE BY BIT (VSLD)
> VECTOR SHIFT RIGHT DOUBLE BY BIT (VSRD)
> VECTOR STRING SEARCH (VSTRS)
>
> modifies:
> VECTOR FP CONVERT FROM FIXED (VCFPS)
> VECTOR FP CONVERT FROM LOGICAL (VCFPL)
> VECTOR FP CONVERT TO FIXED (VCSFP)
> VECTOR FP CONVERT TO LOGICAL (VCLFP)
> VECTOR SHIFT LEFT (VSL)
> VECTOR SHIFT RIGHT ARITHMETIC (VSRA)
> VECTOR SHIFT RIGHT LOGICAL (VSRL)
>
> Signed-off-by: David Miller <dmiller423@gmail.com>

Too many changes in one patch.
You need to split these into smaller, logical units.

> +/* VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO */
> + F(0xe604, VLLEBRZ, VRX, VE2, la2, 0, 0, 0, vllebrz, 0, IF_VEC)
> +/* VECTOR LOAD BYTE REVERSED ELEMENTS */
> + F(0xe606, VLBR, VRX, VE2, la2, 0, 0, 0, vlbr, 0, IF_VEC)
> +/* VECTOR LOAD ELEMENTS REVERSED */
> + F(0xe607, VLER, VRX, VE2, la2, 0, 0, 0, vler, 0, IF_VEC)

Tabs, and more later.

> @@ -457,6 +457,9 @@ static DisasJumpType op_vlrep(DisasContext *s, DisasOps *o)
> return DISAS_NEXT;
> }
>
> +
> +
> +
> static DisasJumpType op_vle(DisasContext *s, DisasOps *o)

Do not add pointless whitespace.

> +static DisasJumpType op_vlebr(DisasContext *s, DisasOps *o)
> +{
> + const uint8_t es = (1 == s->fields.op2) ? 1 : (1 ^ s->fields.op2);
> + const uint8_t enr = get_field(s, m3);
> + TCGv_i64 tmp;
> +
> + if (es < ES_16 || es > ES_64 || !valid_vec_element(enr, es)) {
> + gen_program_exception(s, PGM_SPECIFICATION);
> + return DISAS_NORETURN;
> + }
> +
> + tmp = tcg_temp_new_i64();
> + tcg_gen_qemu_ld_i64(tmp, o->addr1, get_mem_index(s), MO_TE | es);

Just use a little-endian load: MO_LE | es.
While we use MO_TE all over, it's no secret that it's always big-endian.

And everywhere else you do load then swap, or swap then store.

> +}
> +
> +
> +
> +static DisasJumpType op_vsteb(DisasContext *s, DisasOps *o)

More care with spacing.

> +static inline void s390_vec_reverse(S390Vector *vdst,
> + S390Vector *vsrc, uint8_t es)
> +{
> + const uint8_t elems = 1 << (4 - es);
> + uint32_t enr;
> +
> + for (enr = 0; enr < elems; enr++) {
> + switch (es) {
> + case MO_8:
> + s390_vec_write_element8(vdst, enr,
> + s390_vec_read_element8(vsrc, 15 ^ enr));
> + break;
> + case MO_16:
> + s390_vec_write_element16(vdst, enr,
> + s390_vec_read_element16(vsrc, 7 ^ enr));
> + break;
> + case MO_32:
> + s390_vec_write_element32(vdst, enr,
> + s390_vec_read_element32(vsrc, 3 ^ enr));
> + break;
> + case MO_64:
> + s390_vec_write_element64(vdst, enr,
> + s390_vec_read_element64(vsrc, 1 ^ enr));
> + break;
> + default:
> + g_assert_not_reached();
> + }
> + }
> +}

This seems likely to go wrong for vdst == vsrc.
In addition, swapping the order of elements is something that can be done in parallel.

l = src[lo], h = src[hi];
switch (es) {
case MO_64:
dst[hi] = l, dst[lo] = h;
break;
case MO_8:
dst[hi] = bswap64(l);
dst[lo] = bswap64(h);
break;
case MO_16:
dst[hi] = hswap64(l);
dst[lo] = hswap64(h);
break;
case MO_32:
dst[hi] = wswap64(l);
dst[hi] = wswap64(h);
break;
}

which, really, can all be generated inline.

r~

From:	David Miller
Subject:	Re: [PATCH v1 1/2] s390x/tcg: Implement Vector-Enhancements Facility 2 for s390x
Date:	Thu, 3 Mar 2022 11:50:48 -0500