[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/5] plugins/cache: implement unified L2 cache emulation
From: |
Alex Bennée |
Subject: |
Re: [PATCH 2/5] plugins/cache: implement unified L2 cache emulation |
Date: |
Fri, 08 Oct 2021 16:44:49 +0100 |
User-agent: |
mu4e 1.7.0; emacs 28.0.60 |
Mahmoud Mandour <ma.mandourr@gmail.com> writes:
> This adds an implementation of a simple L2 configuration, in which a
> unified L2 cache (stores both blocks of instructions and data) is
> maintained for each core separately, with no inter-core interaction
> taken in account. The L2 cache is used as a backup for L1 and is only
> accessed if the wanted block does not exist in L1.
>
> In terms of multi-threaded user-space emulation, the same approximation
> of L1 is done, a static number of caches is maintained, and each and
> every memory access initiated by a thread will have to go through one of
> the available caches.
>
> An atomic increment is used to maintain the number of L2 misses per
> instruction.
>
> The default cache parameters of L2 caches is:
>
> 2MB cache size
> 16-way associativity
> 64-byte blocks
>
> Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
> ---
> contrib/plugins/cache.c | 256 +++++++++++++++++++++++++++-------------
> 1 file changed, 175 insertions(+), 81 deletions(-)
>
> diff --git a/contrib/plugins/cache.c b/contrib/plugins/cache.c
> index a255e26e25..908c967a09 100644
> --- a/contrib/plugins/cache.c
> +++ b/contrib/plugins/cache.c
> @@ -82,8 +82,9 @@ typedef struct {
> char *disas_str;
> const char *symbol;
> uint64_t addr;
> - uint64_t dmisses;
> - uint64_t imisses;
> + uint64_t l1_dmisses;
> + uint64_t l1_imisses;
> + uint64_t l2_misses;
> } InsnData;
>
> void (*update_hit)(Cache *cache, int set, int blk);
> @@ -93,15 +94,20 @@ void (*metadata_init)(Cache *cache);
> void (*metadata_destroy)(Cache *cache);
>
> static int cores;
> -static Cache **dcaches, **icaches;
> +static Cache **l1_dcaches, **l1_icaches;
> +static Cache **l2_ucaches;
>
> -static GMutex *dcache_locks;
> -static GMutex *icache_locks;
> +static GMutex *l1_dcache_locks;
> +static GMutex *l1_icache_locks;
> +static GMutex *l2_ucache_locks;
Did you experiment with keeping a single locking hierarchy? I measured
quite a high contention with perf while running on system emulation.
While splitting locks can reduce contention I suspect the pattern of
access might just lead to 2 threads serialising twice in a row and
therfore adding to latency.
It might be overly complicated by the current split between i and d
cache for layer 1 which probably makes sense.
Otherwise looks reasonable to me:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [PATCH 2/5] plugins/cache: implement unified L2 cache emulation,
Alex Bennée <=