qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tcg/ppc: Optimize 26-bit jumps


From: Leandro Lupori
Subject: Re: [PATCH] tcg/ppc: Optimize 26-bit jumps
Date: Fri, 9 Sep 2022 09:01:27 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0

On 9/8/22 18:44, Richard Henderson wrote:
On 9/8/22 22:18, Leandro Lupori wrote:
PowerPC64 processors handle direct branches better than indirect
ones, resulting in less stalled cycles and branch misses.

However, PPC's tb_target_set_jmp_target() was only using direct
branches for 16-bit jumps, while PowerPC64's unconditional branch
instructions are able to handle displacements of up to 26 bits.
To take advantage of this, now jumps whose displacements fit in
between 17 and 26 bits are also converted to direct branches.

This doesn't work because you have to be able to unset the jump as well, and your two step sequence doesn't handle that.  (You wind up with the two insn address load reset, but the
jump continuing to the previous target -- boom.)

Hello Richard, thanks for your review!
Right, I hadn't noticed this issue.

For v2.07+, you could use stq to update 4 insns atomically.

I'll try this alternative in v2, so that more CPUs can benefit from this change.

For v3.1+, you can eliminate TCG_REG_TB, using prefixed pc-relative addressing instead. Which brings you back to only needing to update 8 bytes atomically (select either paddi to compute address to feed to following mtctr+bcctr, or direct branch + nop leaving the
mtctr+bcctr alone and unreachable).

(Actually, there are lots of updates one could make to tcg/ppc for v3.1...)


r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]