qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/3] target/arm: add FEAT_TLBIRANGE support


From: Richard Henderson
Subject: Re: [PATCH 2/3] target/arm: add FEAT_TLBIRANGE support
Date: Tue, 15 Dec 2020 08:55:14 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 12/14/20 2:23 PM, Rebecca Cran wrote:
> ARMv8.4 adds the mandatory FEAT_TLBIRANGE, which provides instructions
> for invalidating ranges of entries.
> 
> Signed-off-by: Rebecca Cran <rebecca@nuviainc.com>
> ---
>  accel/tcg/cputlb.c      |  24 ++
>  include/exec/exec-all.h |  39 +++
>  target/arm/helper.c     | 273 ++++++++++++++++++++
>  3 files changed, 336 insertions(+)
> 
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 42ab79c1a582..103f363b42f3 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -603,6 +603,30 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
>      tlb_flush_page_by_mmuidx(cpu, addr, ALL_MMUIDX_BITS);
>  }
>  
> +void tlb_flush_page_range_by_mmuidx(CPUState *cpu, target_ulong addr,
> +                                    int num_pages, uint16_t idxmap)
> +{
> +    int i;
> +
> +    for (i = 0; i < num_pages; i++) {
> +        tlb_flush_page_by_mmuidx(cpu, addr + (i * TARGET_PAGE_SIZE), idxmap);
> +    }
> +}
> +
> +void tlb_flush_page_range_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
> +                                                    target_ulong addr,
> +                                                    int num_pages,
> +                                                    uint16_t idxmap)
> +{
> +    int i;
> +
> +    for (i = 0; i < num_pages; i++) {
> +        tlb_flush_page_by_mmuidx_all_cpus_synced(src_cpu,
> +                                                 addr + (i * 
> TARGET_PAGE_SIZE),
> +                                                 idxmap);
> +    }
> +}

This is a poor way to structure these functions, because each of these calls is
synchronized.  You want to do the cross-cpu call once for the entire set of
pages, synchronizing once at the end.

In addition, tlb_flush_page is insufficient for aarch64, because of TBI.  We
need a version of tlb_flush_page_bits that takes the length of the flush.

This *could* be implemented as a full flush, in the short term.

You could round the length outward to a mask, then merge the low-bit mask of
the length with the high-bit mask of TBI.  That will catch a few more pages
than architecturally required, but less than a full flush.

Certainly I don't think you ever want to perform this loop 32 (max num) * 16
(max scale) * 64 (max page size) = 32768 times.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]