It's been a long time.
I've tested the performances of one jump difference when fast qemu_ld/st
(TLB hit).
The result shows 3.6% CoreMark enhancement when reducing one jump where slow
paths are generated at the end of block as same for the both cases.
That means reducing one jump dominates the majority of performance
enhancement from LDST_OPTIMIZATION.
As a result, it needs extended MMU helper functions for attaining that
performance rising, and those extended functions are used only implicitly.
BTW, who will finally confirm my patches?
I have sent four version of my patches in which I have applied all the
reasonable feedbacks from this community.
Currently, v4 is the final candidate though it might need merge with latest
HEAD because it was sent 1 month before.