[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 3/8] target-sh4: optimize addc using add2
From: |
Aurelien Jarno |
Subject: |
Re: [Qemu-devel] [PATCH v3 3/8] target-sh4: optimize addc using add2 |
Date: |
Thu, 4 Jun 2015 18:08:51 +0200 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On 2015-06-04 12:54, Paolo Bonzini wrote:
>
>
> On 04/06/2015 07:03, Richard Henderson wrote:
> >> + tcg_gen_add2_i32(t1, t2, REG(B11_8), t0, REG(B7_4), t0);
> >> + tcg_gen_add2_i32(REG(B11_8), cpu_sr_t, t1, t2, cpu_sr_t,
> >> t0);
> >
> > Swap these two adds and you don't need t2. You can consume sr_t
> > immediately and start producing it in the same go.
>
> Could TCG do some kind of intra-basic-block live range splitting? In
> this case, the new sr_t could be allocated to a different register than
> the old one, saving one instruction on 2-address targets.
TCG doesn't use a fixed register to a temp, so it's kind of difficult to
know, but let's say it more or less do that (see below). On the other
hand it is really bad at handling the constant in that case.
> The pseudocode below uses "dest, src" operand order:
>
> // add2(t1, cpu_sr_t, cpu_sr_t, t0, REG(B7_4), t0)
> add sr_t_in, B7_4 // instead of mov t1, sr_t; add t1, B7_4
> mov sr_t_out, 0
> adc sr_t_out, 0 // cout(B7_r + sr_t_in)
The registers are allocated from left to right, started by the inputs
first.
- cpu_sr_t is already in register or in memory and loaded to a register
- t0 is a constant, and the add2 op on x86_64 do not accept a constant
three so it is loaded to a register. However it is aliased to the
output and not dead as used again in the second add2 instruction. It
is therefore copied into another register.
- REG(B7_4) is already in register or in memory and loaded to a register
- t0 appears again and has been loaded to a register and therefore not
anymore a constant.
We therefore end up with (Intel notation)
xor %ebx, %ebx // this is t0
mov %r12d, %ebx // a copy of t0
add %r13d, %ebp // %r13d contains B7_4 and %ebp contains sr_t
adc %r12d, %ebx // %r12d is the new sr_t
> // add2(REG(B11_8), cpu_sr_t, t1, cpu_sr_t, REG(B11_8), t0)
> add B11_8, sr_t_in // B11_8 + B7_4 + sr_t_in
> adc sr_t_out, 0 // cout(B11_8 + B7_4 + sr_t_in)
add %ebp, %r13d // %ebp is now B11_8
adc %ebx, %r12d // %ebx is now cpu_sr_t
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
address@hidden http://www.aurel32.net