[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 3/8] target-sh4: optimize addc using add2
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH v3 3/8] target-sh4: optimize addc using add2 |
Date: |
Thu, 04 Jun 2015 12:54:32 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
On 04/06/2015 07:03, Richard Henderson wrote:
>> + tcg_gen_add2_i32(t1, t2, REG(B11_8), t0, REG(B7_4), t0);
>> + tcg_gen_add2_i32(REG(B11_8), cpu_sr_t, t1, t2, cpu_sr_t,
>> t0);
>
> Swap these two adds and you don't need t2. You can consume sr_t
> immediately and start producing it in the same go.
Could TCG do some kind of intra-basic-block live range splitting? In
this case, the new sr_t could be allocated to a different register than
the old one, saving one instruction on 2-address targets.
The pseudocode below uses "dest, src" operand order:
// add2(t1, cpu_sr_t, cpu_sr_t, t0, REG(B7_4), t0)
add sr_t_in, B7_4 // instead of mov t1, sr_t; add t1, B7_4
mov sr_t_out, 0
adc sr_t_out, 0 // cout(B7_r + sr_t_in)
// add2(REG(B11_8), cpu_sr_t, t1, cpu_sr_t, REG(B11_8), t0)
add B11_8, sr_t_in // B11_8 + B7_4 + sr_t_in
adc sr_t_out, 0 // cout(B11_8 + B7_4 + sr_t_in)
Paolo