qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 20/37] target/i386: reimplement 0x0f 0x60-0x6f, add AVX


From: Paolo Bonzini
Subject: Re: [PATCH 20/37] target/i386: reimplement 0x0f 0x60-0x6f, add AVX
Date: Tue, 13 Sep 2022 12:56:10 +0200

On Mon, Sep 12, 2022 at 1:41 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 9/12/22 00:04, Paolo Bonzini wrote:
> > +/*
> > + * 00 = p*  Pq, Qq (if mmx not NULL; no VEX)
> > + * 66 = vp* Vx, Hx, Wx
> > + *
> > + * These are really the same encoding, because 1) V is the same as P when 
> > VEX.V
> > + * is not present 2) P and Q are the same as H and W apart from MM/XMM
> > + */
> > +static inline void gen_binary_int_sse(DisasContext *s, CPUX86State *env, 
> > X86DecodedInsn *decode,
> > +                                      SSEFunc_0_eppp mmx, SSEFunc_0_eppp 
> > xmm, SSEFunc_0_eppp ymm)
>
> No need to inline.

Yes and no, the compiler should indeed be able to figure it out, but
both the assert() and the calls are meant to be optimized out by
inlining. So this kind of function would be even an always_inline
candidate.

> > +{
> > +    assert (!!mmx == !!(decode->e.special == X86_SPECIAL_MMX));
> > +
> > +    if (mmx && (s->prefix & PREFIX_VEX) && !(s->prefix & PREFIX_DATA)) {
> > +        /* VEX encoding is not applicable to MMX instructions.  */
> > +        gen_illegal_opcode(s);
> > +        return;
> > +    }
> > +    if (!(s->prefix & PREFIX_DATA)) {
> > +        mmx(cpu_env, s->ptr0, s->ptr1, s->ptr2);
> > +    } else if (!s->vex_l) {
> > +        xmm(cpu_env, s->ptr0, s->ptr1, s->ptr2);
> > +    } else {
> > +        ymm(cpu_env, s->ptr0, s->ptr1, s->ptr2);
> > +    }
>
> And a reminder from earlier patches that generating the pointers here would 
> be better, as
> well as zeroing the high ymm bits for vex xmm insns.

I'm not sure about that, because there are quite a few cases handled
by more complex gen_* functions, which are helper-based (so not simple
calls to gvec functions where you have maxsz/oprsz) and are not
handled by the common gen_*_sse. For example gen_CVTPI2Px,
gen_MASKMOV, gen_PSRLDQ_i, gen_SSE4a_I, gen_VCVTSI2Sx, ...  All of
these would have to add extra code to set the pointers and to clear
the high ymm bits.

For gen_load, however, i can delay the generation using something like

static inline TCGv_ptr get_ptr0(DisasContext *s)
{
    if (s->ptr0) {
        return s->ptr0;
    }
    s->ptr0 = tcg_temp_new_ptr();
    tcg_gen_add(s->ptr0, cpu_env, ...);
    return s->ptr0;
}

Most of the changes to this series are mechanical, so if you dislike
relying on DCE then why not.

For gen_writeback, keeping gen_writeback eliminates duplicated code
and keeps the phases of disas_insn_new separated, so I prefer it
slightly. For now I'd rather leave it as is; with the above get_ptr0()
function that creates s->ptr0 lazily, perhaps gen_writeback() could do
it only if s->ptr0 is set (suggesting that a helper was used), while
gvec helpers would use the oprsz<maxsz feature. There's something to
be said for keeping the initial implementation simple of course,
especially since it's already slightly better than the code produced
by the existing decoder.

> > +    switch (ot) {
> > +    case MO_32:
> > +#ifdef TARGET_X86_64
> > +        tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
> > +        tcg_gen_st_i32(s->tmp3_i32, cpu_env, lo_ofs);
> > +        break;
>
> This could also be
>
>      tcg_gen_gvec_dup_i64(MO_64, offset, 8, sse_vec_max_len, s->T1);

Yeah, it can be something like

    case MO_32:
        tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
        tcg_gen_gvec_dup_i32(MO_32, decode->op[0].offset, 4, vec_len,
s->tmp3_i32);
        break;
#ifdef TARGET_X86_64
    case MO_64:
        tcg_gen_gvec_dup_i64(MO_64, decode->op[0].offset, 8, vec_len, s->T1);
        break;
#endif

and in this case of course it's not possible to use st32_tl.

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]