qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction


From: Stefan Brankovic
Subject: Re: [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction
Date: Wed, 16 Oct 2019 15:53:27 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0


On 29.8.19. 17:31, Richard Henderson wrote:
On 8/29/19 6:34 AM, Stefan Brankovic wrote:
Then I run my performance tests and I got following results(test is calling
vpkpx 100000 times):

1) Current helper implementation: ~ 157 ms

2) helper implementation you suggested: ~94 ms

3) tcg implementation: ~75 ms
I assume you tested in a loop.  If you have just the one expansion, you'll not
see the penalty for the icache expansion.  To show the other extreme, you'd
want to test as separate sequential invocations.
Yes, testing is done in a loop.

That said, I'd be more interested in a real test case that isn't just calling
one instruction over and over.  Is there a real test case that shows vpkpx in
the top 25 of the profile?  With more than 0.5% of runtime?


r~

I made an experiment where I started MAC OSX 10.4 in QEMU system mode and I found out that vpkpx instruction is widely used to display different graphical elements. With that in mind, this performance improvement is of great importance.

Also, vpkpx instruction is often used in a loop, to process big amount of pixels at once. That's why testing performance of this instruction in a loop should give good insight of how this instruction perform overall.

Kind Regards,

Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]