qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: x86 TCG helpers clobbered registers


From: Richard Henderson
Subject: Re: x86 TCG helpers clobbered registers
Date: Fri, 4 Dec 2020 13:35:55 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 12/4/20 9:36 AM, Stephane Duverger wrote:
> Hello,
> 
> While looking at tcg/i386/tcg-target.c.inc:tcg_out_qemu_st(), I
> discovered that the TCG generates a call to a store helper at the end
> of the TB which is executed on TLB miss and get back to the remaining
> translated ops. I tried to mimick this behavior around the fast path
> (right between tcg_out_tlb_load() and tcg_out_qemu_st_direct()) to
> filter on memory store accesses.

There's your bug -- don't do that.

> I know there is now TCG plugins for that purpose at TCG IR level,
> which every tcg-target might benefit. FWIW, my design choice was more
> led by the fact that I always work on an x86 host and plugins did not
> exist by the time. Anyway, the point is more related to generating a
> call to a helper at the TCG IR level (classic scenario), or later
> during tcg-target code generation (slow path for instance).

You can't just inject a call anywhere you like.  If you add it at the IR level,
then the rest of the compiler will see it and work properly.  If you add the
call in the middle of another operation, the compiler doesn't get to see it and
Bad Things Happen.

> The TCG when calling a helper knows that some registers will be call
> clobbered and as such must free them. This is what I observed in
> tcg_reg_alloc_call():
> 
> /* clobber call registers */
> for (i = 0; i < TCG_TARGET_NB_REGS; i++) {
>     if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) {
>         tcg_reg_free(s, i, allocated_regs);
>     }
> }
> 
> But in our case (ie. INDEX_op_qemu_st_i32), the TCG code path comes
> from:
> 
> tcg_reg_alloc_op()
>   tcg_out_op()
>     tcg_out_qemu_st()
> 
> Then tcg_out_tlb_load() will inject a 'jmp' to the slow path, whose
> generated code does not seem to take care of every call clobbered
> registers, if we look at tcg_out_qemu_st_slow_path().

You missed

>         if (def->flags & TCG_OPF_CALL_CLOBBER) {
>             /* XXX: permit generic clobber register list ? */ 
>             for (i = 0; i < TCG_TARGET_NB_REGS; i++) {
>                 if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) {
>                     tcg_reg_free(s, i, i_allocated_regs);
>                 }
>             }
>         }

which handles this in tcg_reg_alloc_op.


> First for an i386 (32bits) tcg-target, as expected, the helper
> arguments are injected into the stack. I noticed that 'esp' is not
> shifted down before stacking up the args, which might corrupt last
> stacked words.

No, we generate code for a constant esp, as if by gcc's -mno-push-args option.
 We have reserved TCG_STATIC_CALL_ARGS_SIZE bytes of stack for the arguments
(which is actually larger than necessary for any of the tcg targets).


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]