qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG


From: Alex Bennée
Subject: Re: [PATCH v9 05/13] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
Date: Fri, 13 Dec 2019 21:49:55 +0000
User-agent: mu4e 1.3.5; emacs 27.0.50

Richard Henderson <address@hidden> writes:

> On 10/7/19 11:28 AM, Alex Bennée wrote:
>> From: "Vanderson M. do Rosario" <address@hidden>
>> 
>> Replace all others CONFIG_PROFILER statistics and migrate it to
>> TBStatistics system. However, TCGProfiler still exists and can
>> be use to store global statistics and times. All TB related
>> statistics goes to TBStatistics.
>> 
>> Signed-off-by: Vanderson M. do Rosario <address@hidden>
>> Message-Id: <address@hidden>
>> [AJB: fix authorship]
>> Signed-off-by: Alex Bennée <address@hidden>
>> ---
>>  accel/tcg/tb-stats.c      |  98 +++++++++++++++++++++---------
>>  accel/tcg/translate-all.c |  46 ++++++++-------
>>  configure                 |   3 -
>>  cpus.c                    |  14 ++---
>>  include/exec/tb-stats.h   |  15 +++++
>>  include/qemu/timer.h      |   5 +-
>>  monitor/misc.c            |  28 ++-------
>>  tcg/tcg.c                 | 121 ++++++++++++--------------------------
>>  tcg/tcg.h                 |   2 +-
>>  vl.c                      |   8 +--
>>  10 files changed, 159 insertions(+), 181 deletions(-)
>> 
>> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
>> index 0e64c176b3..f431159fd2 100644
>> --- a/accel/tcg/tb-stats.c
>> +++ b/accel/tcg/tb-stats.c
>> @@ -17,11 +17,18 @@
>>  #include "exec/tb-stats.h"
>>  
>>  /* TBStatistic collection controls */
>> -enum TBStatsStatus { TB_STATS_RUNNING, TB_STATS_PAUSED, TB_STATS_STOPPED };
>> +enum TBStatsStatus {
>> +    TB_STATS_DISABLED = 0,
>> +    TB_STATS_RUNNING,
>> +    TB_STATS_PAUSED,
>> +    TB_STATS_STOPPED
>> +};
>
> This should be in patch 1, I should think.
>
>> +uint64_t dev_time;
>
> This patch is doing several things at once, and I think it should be split.
> All of the changes to dev_time, for instance, are unrelated to TBStatistic.
>
>> +    jpi->interm_time += stat_per_translation(tbs, time.interm);
>> +    jpi->code_time += stat_per_translation(tbs, time.code);
>> +    jpi->opt_time += stat_per_translation(tbs, time.opt);
>> +    jpi->la_time += stat_per_translation(tbs, time.la);
>> +    jpi->restore_time += tbs->time.restore;
>> +    jpi->restore_count += tbs->time.restore_count;
>
> Why are some of these "per translation" and some not?

The restore_time is an amortised time or all restore operations (so
dividable by restore_count rather than the number of translations).

I've added some commentary.

>
>> @@ -370,11 +371,11 @@ static int cpu_restore_state_from_tb
>>      }
>>      restore_state_to_opc(env, tb, data);
>>  
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&prof->restore_time,
>> -                prof->restore_time + profile_getclock() - ti);
>> -    atomic_set(&prof->restore_count, prof->restore_count + 1);
>> -#endif
>> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
>> +        atomic_add(&tb->tb_stats->time.restore, profile_getclock() - ti);
>> +        atomic_inc(&tb->tb_stats->time.restore_count);
>> +    }
>
> Would it be better to use a the TBStatistics lock than two atomic
> updates?

Yeah I think keeping the locking consistent as a mutex probably pays of
in reducing complexity as well.

>
>> @@ -1826,10 +1828,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>          tcg_ctx->tb_jmp_target_addr = tb->jmp_target_arg;
>>      }
>>  
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - 
>> ti);
>> -    ti = profile_getclock();
>> -#endif
>> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
>> +        atomic_add(&tb->tb_stats->time.interm, profile_getclock() - ti);
>> +        ti = profile_getclock();
>> +    }
>
> You should call profile_getclock() only once here.
>
> Why does this need an atomic_add, while the rest of TB_JIT_STATS within
> tb_gen_code do not, and in fact have a comment:
>
>> +    /*
>> +     * Collect JIT stats when enabled. We batch them all up here to
>> +     * avoid spamming the cache with atomic accesses
>> +     */
>
>> @@ -1871,9 +1873,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>      }
>>      tb->tc.size = gen_code_size;
>>  
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
>> -#endif
>> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
>> +        atomic_add(&tb->tb_stats->time.code, profile_getclock() - ti);
>> +    }
>
> Likewise.
>
>> diff --git a/configure b/configure
>> index 8f8446f52b..eedeb9016e 100755
>> --- a/configure
>> +++ b/configure
>> @@ -6566,9 +6566,6 @@ fi
>>  if test "$static" = "yes" ; then
>>    echo "CONFIG_STATIC=y" >> $config_host_mak
>>  fi
>> -if test "$profiler" = "yes" ; then
>> -  echo "CONFIG_PROFILER=y" >> $config_host_mak
>> -fi
>
> Removing the define without removing --enable-profiler.
>
>>  static int tcg_cpu_exec(CPUState *cpu)
>>  {
>>      int ret;
>> -#ifdef CONFIG_PROFILER
>> -    int64_t ti;
>> -#endif
>> +    uint64_t ti;
>>  
>>      assert(tcg_enabled());
>> -#ifdef CONFIG_PROFILER
>>      ti = profile_getclock();
>> -#endif
>> +
>>      cpu_exec_start(cpu);
>>      ret = cpu_exec(cpu);
>>      cpu_exec_end(cpu);
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&tcg_ctx->prof.cpu_exec_time,
>> -               tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
>> -#endif
>> +
>> +    atomic_add(&tcg_ctx->prof.cpu_exec_time, profile_getclock() - ti);
>
> This is also unrelated to TBStatistics.

I'm going to split this patch up into more sub-commits to do each piece
one at a time. To make it easier can I drop CONFIG_PROFILE support
before the re-factoring so I don't need to keep both modes compiling
during the transition?

> It's also adding an unconditional atomic_add, of which I am not fond.  Should
> this be testing tb_stats_collection_enabled() or something else?
>
> I'll also note that tcg_ctx is per-thread (for mttcg), and so perhaps this 
> does
> not require an atomic_add anyway.  Perhaps just an atomic_set to be paired 
> with
> atomic_read in the dump function that's iterating over the tcg_ctx.

I'll try an improve the commenting of TCGProf about what is just holding
per-translations values which get copied at the end and what is
persistent and may be read from outside the translation context (I think
just the OPC count array now).

>
> Even without the atomic_add, we shouldn't make the syscall for getclock in the
> normal case.
>
>
>> +    /* These are accessed with atomic operations */
>> +    struct {
>> +        int64_t restore;
>> +        uint64_t restore_count;
>> +        int64_t interm;
>> +        int64_t code;
>> +        int64_t opt;
>> +        int64_t la;
>> +    } time;
>
> Why are these signed?  We're always adding positive values.
>
> Why is restore_count here in "time"?  Indeed, why all of these unnamed
> sub-structures at all?  I don't see that "." helps organization any more than 
> "_".
>
>> @@ -4020,18 +3959,18 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>>      }
>>  #endif
>>  
>> -#ifdef CONFIG_PROFILER
>> -    atomic_set(&prof->opt_time, prof->opt_time - profile_getclock());
>> -#endif
>> +    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
>> +        atomic_add(&tb->tb_stats->time.opt, -profile_getclock());
>> +    }
>
> Atomic add of a negative clock value?  That just means the intermediate value
> is unusable, so why make any of this atomic?

I've rewritten all this to just grab some timestamps at various key
points and then bring it all together when we update the TBStat in
translate.

>
> Also, this is tb_gen_code again, wherein we already talked about not using
> atomic_foo.
>
>> @@ -4081,14 +4020,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>>      s->pool_labels = NULL;
>>  #endif
>>  
>> +    if (!tb_stats_collection_enabled()) {
>> +        QTAILQ_FOREACH(op, &s->ops, link) {
>> +            TCGOpcode opc = op->opc;
>> +            atomic_add(&s->prof.table_op_count[opc], 1);
>> +        }
>> +    }
>
> Why would you collect stats when stats collection is disabled?  Is this a
> simple typo?

I think if we can get away with not using atomics this count can go back
to being inline with the main opc loop. If we aren't using atomics is
there really such a big cost to just doing it unconditionally?


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]