|
From: | Leandro Lupori |
Subject: | Re: [PATCH] tcg/ppc: Optimize 26-bit jumps |
Date: | Fri, 9 Sep 2022 09:01:27 -0300 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 |
On 9/8/22 18:44, Richard Henderson wrote:
On 9/8/22 22:18, Leandro Lupori wrote:PowerPC64 processors handle direct branches better than indirect ones, resulting in less stalled cycles and branch misses. However, PPC's tb_target_set_jmp_target() was only using direct branches for 16-bit jumps, while PowerPC64's unconditional branch instructions are able to handle displacements of up to 26 bits. To take advantage of this, now jumps whose displacements fit in between 17 and 26 bits are also converted to direct branches.This doesn't work because you have to be able to unset the jump as well, and your two step sequence doesn't handle that. (You wind up with the two insn address load reset, but thejump continuing to the previous target -- boom.)
Hello Richard, thanks for your review! Right, I hadn't noticed this issue.
I'll try this alternative in v2, so that more CPUs can benefit from this change.For v2.07+, you could use stq to update 4 insns atomically.
For v3.1+, you can eliminate TCG_REG_TB, using prefixed pc-relative addressing instead. Which brings you back to only needing to update 8 bytes atomically (select either paddi to compute address to feed to following mtctr+bcctr, or direct branch + nop leaving themtctr+bcctr alone and unreachable). (Actually, there are lots of updates one could make to tcg/ppc for v3.1...) r~
[Prev in Thread] | Current Thread | [Next in Thread] |