qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Question about direct block chaining


From: Taylor Simpson
Subject: RE: Question about direct block chaining
Date: Tue, 19 Apr 2022 06:02:05 +0000


> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Monday, April 18, 2022 10:38 AM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Subject: Re: Question about direct block chaining
> 
> On 4/18/22 07:54, Taylor Simpson wrote:
> > I implemented both approaches for inner loops and didn't see speedup
> > in my benchmark.  So, I have a couple of questions
> > 1) What are the pros and cons of the two approaches
> (lookup_and_goto_ptr and goto_tb + exit_tb)?
> 
> goto_tb can only be used within a single page (plus other restrictions, see
> translator_use_goto_tb).  In addition, as documented, the change in cpu
> state must be constant, beginning with a direct jump.
> 
> lookup_and_goto_ptr can handle any change in cpu state, including indirect
> jumps.
> 
> 
> > 2) How can I verify that direct block chaining is working properly?
> >        With -d exec, I see lines like the following with goto_tb + exit_tb 
> > but
> NOT lookup_and_goto_ptr
> >        Linking TBs 0x7fda44172e00 [0050ac38] index 1 -> 0x7fda44173b40
> > [0050ac6c]
> 
> Well, that's one way.  I would have also suggested simply looking at -d op
> output, for the various branchy cases you're considering, to see that all of 
> the
> exits are as expected.

Thanks!!

I created a synthetic benchmark with a loop with a very small body and a very 
high number of iterations.  I can see differences in execution time.

Here are my observations:
- goto_tb + exit_tb gives the fastest execution time because it will patch the 
native jump address
- lookup_and_goto_ptr is an improvement over tcg_gen_exit_tb(NULL, 0)

Taylor


reply via email to

[Prev in Thread] Current Thread [Next in Thread]