[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: patch s390x/tcg: Implement Miscellaneous-Instruction-Extensions Faci
From: |
David Miller |
Subject: |
Re: patch s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x |
Date: |
Mon, 14 Feb 2022 21:59:11 -0500 |
Thanks for the review.
I made all of the changes requested and sent again.
The popcnt patch is in it, the tests are removed.
I will see about changing the instructions in a manner that won't
require building w. -march=z15 and sending a patch again after
finished up the next task.
Will have Vector-Enhancements Facility 2 patch later this week hopefully.
Thanks,
- David Miller
On Mon, Feb 14, 2022 at 1:48 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.02.22 23:52, David Miller wrote:
> > From c21eaa57a40ed22f10087a1019dd456d99e3fb03 Mon Sep 17 00:00:00 2001
> > From: David Miller <dmiller423@gmail.com>
> > Date: Tue, 8 Feb 2022 22:33:48 -0500
> > Subject: [PATCH] s390x/tcg: Implement Miscellaneous-Instruction-Extensions
> > Facility 3 for the s390x
>
> Thanks for working on this!
>
>
> a) I fail to apply the patch for some reason:
>
> Applying: patch s390x/tcg: Implement
> Miscellaneous-Instruction-Extensions Facility 3 for the s390x
> error: corrupt patch at line 90
> Patch failed at 0001 patch s390x/tcg: Implement
> Miscellaneous-Instruction-Extensions Facility 3 for the s390x
>
>
> Do you use "git format-patch" + "git send-email" to generate+send patches?
>
> b) Can you add a short patch description?
>
> c) Can you split off the tests into a separate patch?
>
> [...]
>
> > +/* OR WITH COMPLEMENT */
> > + C(0xb975, OCRK, RRF_a, MIE3, r2, r3, new, r1_32, orc, nz32)
> > + C(0xb965, OCGRK, RRF_a, MIE3, r2, r3, r1, 0, orc, nz64)
> > +
> > +/* NAND */
> > + C(0xb974, NNRK, RRF_a, MIE3, r2, r3, new, r1_32, nand, nz32)
> > + C(0xb964, NNGRK, RRF_a, MIE3, r2, r3, r1, 0, nand, nz64)
> > +
> > +/* NOR */
> > + C(0xb976, NORK, RRF_a, MIE3, r2, r3, new, r1_32, nor, nz32)
> > + C(0xb966, NOGRK, RRF_a, MIE3, r2, r3, r1, 0, nor, nz64)
> > +
> > +/* NOT EXCLUSIVE OR */
> > + C(0xb977, NXRK, RRF_a, MIE3, r2, r3, new, r1_32, nxor, nz32)
> > + C(0xb967, NXGRK, RRF_a, MIE3, r2, r3, r1, 0, nxor, nz64)
> >
>
> The N* ones should go to the proper previous section, no?
>
> > /* PACK */
> > /* Really format SS_b, but we pack both lengths into one argument
> > @@ -765,6 +785,12 @@
> > /* SEARCH STRING UNICODE */
> > C(0xb9be, SRSTU, RRE, ETF3, 0, 0, 0, 0, srstu, 0)
> >
> > +/* SELECT */
> > + C(0xb9f0, SELR, RRF_a, MIE3, r2, r3, new, r1_32, sel, 0)
> > + C(0xb9e3, SELGR, RRF_a, MIE3, r2, r3, r1, 0, sel, 0)
> > +/* SELECT HIGH */
> > + C(0xb9c0, SELFHR, RRF_a, MIE3, r2, r3, new, r1_32h, sel, 0)
> > +
> > /* SET ACCESS */
> > C(0xb24e, SAR, RRE, Z, 0, r2_o, 0, 0, sar, 0)
> > /* SET ADDRESSING MODE */
> > diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
> > index 406578d105..9275f1349f 100644
> > --- a/target/s390x/tcg/mem_helper.c
> > +++ b/target/s390x/tcg/mem_helper.c
> > @@ -546,6 +546,48 @@ void HELPER(mvc)(CPUS390XState *env, uint32_t l,
> > uint64_t dest, uint64_t src)
> > do_helper_mvc(env, l, dest, src, GETPC());
> > }
> >
> > +/* move right to left */
> > +static uint32_t do_helper_mvcrl(CPUS390XState *env, uint64_t l, uint64_t
> > dest,
> > + uint64_t src, uintptr_t ra)
> > +{
> > + const int mmu_idx = cpu_mmu_index(env, false);
> > + S390Access srca, desta;
> > + uint32_t i;
> > +
> > + HELPER_LOG("%s l %d dest %" PRIx64 " src %" PRIx64 "\n",
> > + __func__, l, dest, src);
> > +
> > + /* MVCRL always copies one more byte than specified - maximum is 256 */
> > + l++;
> > +
> > + srca = access_prepare(env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
> > + desta = access_prepare(env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
> > +
> > + /*
> > + * "When the operands overlap, the result is obtained as if the
> > operands
> > + * were processed one byte at a time". Only non-destructive overlaps
> > + * behave like memmove().
> > + */
> > + if (dest == src + l - 1) {
>
> Shouldn't this be "src = dest + 1"?
>
> SRC 23456789
> DST 12345678
>
> IIUC would have to result in
> -> 999999999
>
> Because we first copy the I to the 9 to the 8, then the new 9 (old 8) to
> the 7 and so on.
>
>
> BUT
>
> "
> Destructive overlap occurs and results are unpredict-
> able when the byte location designated by the sec-
> ond-operand address (the leftmost byte of the
> source) overlaps with any byte location of the first
> operand, other than the byte location designated by
> the first-operand address (the leftmost byte of the tar-
> get).
> "
> IIUC, it says that on any destructive overlap the result is
> unpredictable. Which makes sense, because we want to optimize for the
> array use case which is e.g.,
>
> SRC 12345678
> DST 23456789
> -> 112345678
>
> So I'd suggest you simply always do the proper loop. No need to optimize
> for something with unpredictable behavior.
>
> You could think about
>
> if (!is_destructive_overlap(env, dest, src, l)) {
> access_memmove(env, &desta, &srca, ra);
> } else {
> ... full loop
> }
>
> because it would cover the valid src==dest case "faster". (note that we
> have to read+write even if src==dest to trigger proper access)
>
> > + access_memset(env, &desta, access_get_byte(env, &srca, 0, ra), ra);
> > + } else if (!is_destructive_overlap(env, dest, src, l)) {
> > + access_memmove(env, &desta, &srca, ra);
> > + } else {
> > + for (i = 0; i < l; i++) {
> > + uint32_t ti = l - i - 1;
> > + uint8_t byte = access_get_byte(env, &srca, ti, ra);
> > + access_set_byte(env, &desta, ti, byte, ra);
> > + }
>
> Or simpler
>
> for (i = l - 1; i >= 0; i--) {
> uint8_t byte = access_get_byte(env, &srca, i, ra);
> access_set_byte(env, &desta, i, byte, ra);
> }
>
> > + }
> > +
> > + return env->cc_op;
> > +}
> > +
> > +void HELPER(mvcrl)(CPUS390XState *env, uint64_t l, uint64_t dest, uint64_t
> > src)
> > +{
> > + do_helper_mvcrl(env, l, dest, src, GETPC());
>
> Unless you intend to reuse do_helper_mvcrl() somewhere else , just
> inline do_helper_mvcrl() here. Then you can also drop the "return
> env->cc_op;"
>
> > +}
> > +
> > /* move inverse */
> > void HELPER(mvcin)(CPUS390XState *env, uint32_t l, uint64_t dest, uint64_t
> > src)
> > {
> > diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> > index 46dea73357..c0a89e2787 100644
> > --- a/target/s390x/tcg/translate.c
> > +++ b/target/s390x/tcg/translate.c
> > @@ -1498,6 +1498,48 @@ static DisasJumpType op_andi(DisasContext *s,
> > DisasOps *o)
> > return DISAS_NEXT;
> > }
> >
> > +static DisasJumpType op_andc(DisasContext *s, DisasOps *o)
> > +{
> > + tcg_gen_andc_i64(o->out, o->in1, o->in2);
> > + return DISAS_NEXT;
> > +}
> > +
> > +static DisasJumpType op_orc(DisasContext *s, DisasOps *o)
> > +{
> > + tcg_gen_orc_i64(o->out, o->in1, o->in2);
> > + return DISAS_NEXT;
> > +}
> > +
> > +static DisasJumpType op_nand(DisasContext *s, DisasOps *o)
> > +{
> > + tcg_gen_nand_i64(o->out, o->in1, o->in2);
> > + return DISAS_NEXT;
> > +}
> > +
> > +static DisasJumpType op_nor(DisasContext *s, DisasOps *o)
> > +{
> > + tcg_gen_nor_i64(o->out, o->in1, o->in2);
> > + return DISAS_NEXT;
> > +}
> > +
> > +static DisasJumpType op_nxor(DisasContext *s, DisasOps *o)
> > +{
> > + tcg_gen_xor_i64(o->out, o->in1, o->in2);
> > + tcg_gen_not_i64(o->out, o->out);
>
> That can be simplified to
>
> tcg_gen_eqv_i64(o->out, o->in1, o->in2);
>
> > + return DISAS_NEXT;
> > +}
> > +
> > +static DisasJumpType op_sel(DisasContext *s, DisasOps *o)
> > +{
> > + DisasCompare c;
> > + disas_jcc(s, &c, get_field(s, m4));
> > + tcg_gen_movcond_i64(c.cond, o->out, c.u.s64.a, c.u.s64.b,
> > + o->in1, o->in2);
> > + free_compare(&c);
> > + return DISAS_NEXT;
> > +}
> > +
> > +
>
> Drop one \n
>
> [...]
>
> > static const DisasInsn insn_info[] = {
> > #include "insn-data.def"
> > diff --git a/tests/tcg/s390x/Makefile.target
> > b/tests/tcg/s390x/Makefile.target
> > index 1a7238b4eb..16b9d45307 100644
> > --- a/tests/tcg/s390x/Makefile.target
> > +++ b/tests/tcg/s390x/Makefile.target
> > @@ -1,6 +1,6 @@
> > S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
> > VPATH+=$(S390X_SRC)
> > -CFLAGS+=-march=zEC12 -m64
> > +CFLAGS+=-march=z15 -m64
>
>
> Hmm, that might be dangerous in general before we "fully" support the
> z15 in TCG. It would be better to encode the instructions via inline
> assembly until TCG supports the z15.
>
> (most probably MIE3 is the only thing we really need that gcc might
> automatically use, but I haven't double-checked yet)
>
> --
> Thanks,
>
> David / dhildenb
>