|
From: | Stefan Brankovic |
Subject: | Re: [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction |
Date: | Wed, 16 Oct 2019 15:53:27 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 |
On 29.8.19. 17:31, Richard Henderson wrote:
On 8/29/19 6:34 AM, Stefan Brankovic wrote:Then I run my performance tests and I got following results(test is calling vpkpx 100000 times): 1) Current helper implementation: ~ 157 ms 2) helper implementation you suggested: ~94 ms 3) tcg implementation: ~75 msI assume you tested in a loop. If you have just the one expansion, you'll not see the penalty for the icache expansion. To show the other extreme, you'd want to test as separate sequential invocations.
Yes, testing is done in a loop.
That said, I'd be more interested in a real test case that isn't just calling one instruction over and over. Is there a real test case that shows vpkpx in the top 25 of the profile? With more than 0.5% of runtime? r~
I made an experiment where I started MAC OSX 10.4 in QEMU system mode and I found out that vpkpx instruction is widely used to display different graphical elements. With that in mind, this performance improvement is of great importance.
Also, vpkpx instruction is often used in a loop, to process big amount of pixels at once. That's why testing performance of this instruction in a loop should give good insight of how this instruction perform overall.
Kind Regards, Stefan
[Prev in Thread] | Current Thread | [Next in Thread] |