Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe

qemu-ppc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe

From:	Nicholas Piggin
Subject:	Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe
Date:	Thu, 07 Dec 2023 14:22:06 +1000

On Thu Dec 7, 2023 at 11:35 AM AEST, BALATON Zoltan wrote:
> Hello,
>
> On Wed, 15 Nov 2023, BALATON Zoltan wrote:
> > On Tue, 14 Nov 2023, Nicholas Piggin wrote:
> >> Well I split out these patches and looked a bit closer and added
> >> a few more things.
> >> 
> >> I think it may be a bit too much to do the optimisations for
> >> this release, because 4xx TLB flushing has some quirks too so
> >> it's not just simple implementation of 4xx scheme in 440. We
> >> could try for next time.
> >> 
> >> The bug fix patch 1 maybe we should do. We haven't been able to
> >> confirm it fixes anything but there was mention of occasional
> >> random crashes.
> >
> > I did some quick testing of this series and found that patch 1 alone makes 
> > it 
> > slower but not known to fix any issue so I'd say don't commit just this 
> > patch 
> > without the rest. The current version works enoigh so we can live with that 
> > until the next version. With the other patches it's faster and the last 
> > patch 
> > does make a difference, it makes it a bit faster. I did not record the 
> > numbers and only did one measurement so it's only approximate but unless 
> > you 
> > plan to take the whole series now then keep patch 1 for next devel cycle as 
> > well.
>
> We've done some more experiments and I've collected some numbers now. The 
> test was running lame to convert a wav file to mp3 right after boot and 
> then get "info jit" after it finished. The same executable runs on 
> pegasos2 and sam460ex so we can compare these before and after this series 
> and to pegasos2 as well. These were run on the same host machine so the 
> numbers should be comparable. (This test is also hitting the slow FPU 
> emulation on PPC target that's another reason it runs slowly.)
>
> On pegasos2 I get:
>
> Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=2)
>      Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
>    1149/1149  (100%)|    0:33/    0:33|    0:33/    0:33|   0.8982x|    0:00
> QEMU 8.1.92 monitor - type 'help' for more information
> Accelerator settings:
> one-insn-per-tb: off
>
> Translation buffer state:
> gen code size       29666515/1023052800
> TB count            52723
> TB avg target size  24 max=2048 bytes
> TB avg host size    325 bytes (expansion ratio: 13.4)
> cross page TB count 0 (0%)
> direct jump count   31917 (60%) (2 jumps=25829 48%)
> TB hash buckets     24452/32768 (74.62% head buckets used)
> TB hash occupancy   33.37% avg chain occ. Histogram: [0,10)%|▆ █  ▅▁▃▁▁|[9
> TB hash avg chain   1.018 buckets. Histogram: 1|█▁3
>
> Statistics:
> TB flush count      0
> TB invalidate count 7841
> TLB full flushes    0
> TLB partial flushes 13298
> TLB elided flushes  100190
> [TCG profiler not compiled]
>
> On sam460ex *without* this series:
>
>      Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
>    1149/1149  (100%)|    0:37/    0:37|    0:37/    0:37|   0.8093x|    0:00
> QEMU 8.1.92 monitor - type 'help' for more information
> Accelerator settings:
> one-insn-per-tb: off
>
> Translation buffer state:
> gen code size       32917427/1023052800
> TB count            60534
> TB avg target size  22 max=2048 bytes
> TB avg host size    306 bytes (expansion ratio: 13.9)
> cross page TB count 0 (0%)
> direct jump count   37047 (61%) (2 jumps=29011 47%)
> TB hash buckets     26619/32768 (81.23% head buckets used)
> TB hash occupancy   40.02% avg chain occ. Histogram: [0,10)%|▅ █  ▆▁▄▁▂|[9
> TB hash avg chain   1.035 buckets. Histogram: 1|█▁3
>
> Statistics:
> TB flush count      0
> TB invalidate count 5629
> TLB full flushes    0
> TLB partial flushes 508238
> TLB elided flushes  7680722
> [TCG profiler not compiled]
>
> On sam460ex *with* this series:
>
>      Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA
>    1149/1149  (100%)|    0:34/    0:34|    0:34/    0:34|   0.8595x|    0:00
> QEMU 8.1.92 monitor - type 'help' for more information
> Accelerator settings:
> one-insn-per-tb: off
>
> Translation buffer state:
> gen code size       33094883/1023052800
> TB count            60607
> TB avg target size  22 max=2048 bytes
> TB avg host size    308 bytes (expansion ratio: 13.9)
> cross page TB count 0 (0%)
> direct jump count   37093 (61%) (2 jumps=29038 47%)
> TB hash buckets     26682/32768 (81.43% head buckets used)
> TB hash occupancy   40.12% avg chain occ. Histogram: [0,10)%|▅ █  ▆▁▄▁▂|[9
> TB hash avg chain   1.034 buckets. Histogram: 1|█▁3
>
> Statistics:
> TB flush count      0
> TB invalidate count 5628
> TLB full flushes    0
> TLB partial flushes 73
> TLB elided flushes  1143
> [TCG profiler not compiled]

Great, thanks for the numbers.

> The excessive TLB flushes are resolved, there are even much less now than 
> on pegasos2 that uses a G4 CPU. I wonder why and if that could be reduced 
> further as well for books. I still runs slower on sam460ex than on 
> pegasos2 but that will need further profiling to find out what is the next 
> bottle neck.

G4 uses segments and hash table? I think the problem with that is QEMU
TLB does not match the MMU well, so a TLBIE address can not easily match
to a QEMU TLB address.

So it would not be trivial to improve like this series. It could be an
interesting project, I think you need some way to quickly map a hash
virtual address to the possible segment effective addresses that could
be mapping it, and so you can invalidate those addresses (that is what
TCG TLBs cache).

Thanks,
Nick

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe, BALATON Zoltan, 2023/12/06
- Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe, Nicholas Piggin <=
- Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe, BALATON Zoltan, 2023/12/22

Prev by Date: [PATCH v7 3/3] hw/ppc: N1 chiplet wiring
Next by Date: Re: [PATCH v7 1/3] hw/ppc: Add pnv nest pervasive common chiplet model
Previous by thread: Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe
Next by thread: Re: [RFC PATCH 0/6] target/ppc: Improve 4xx and 440 tlbwe
Index(es):
- Date
- Thread