[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Plugin Memory Callback Debugging
From: |
Alex Bennée |
Subject: |
Re: Plugin Memory Callback Debugging |
Date: |
Tue, 15 Nov 2022 22:36:07 +0000 |
User-agent: |
mu4e 1.9.2; emacs 28.2.50 |
Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> Hello,
>
> I have been wrestling with what might be a bug in the plugin memory
> callbacks. The immediate error is that I hit the
> `g_assert_not_reached()` in the 'default:' case in
> qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
> breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
> are obviously bogus (`len` was absurdly high, for example). After doing
> some further digging/instrumenting, I eventually found that
> `free_dyn_cb_arr(void *p, ...)` is being called shortly before the
> assertion is hit with `p` pointing to the same address as
> `cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
> the memory still pointed to by `cpu->plugin_mem_cbs`.
>
> I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> end of an instruction/TB's execution, so its not exactly clear to me how this
> is occurring. However, I suspect it may be relevant that we are calling
> `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
Hmm I'm going to have to remind myself about how this bit works.
>
> I have additionally found that the below addition allows me to run
> successfully
> without hitting the assert:
>
> diff --git a/plugins/core.c b/plugins/core.c
> --- a/plugins/core.c
> +++ b/plugins/core.c
> @@ -427,9 +427,14 @@ static bool free_dyn_cb_arr(void *p, uint32_t h, void
> *userp)
>
> void qemu_plugin_flush_cb(void)
> {
> + CPUState *cpu;
> qht_iter_remove(&plugin.dyn_cb_arr_ht, free_dyn_cb_arr, NULL);
> qht_reset(&plugin.dyn_cb_arr_ht);
>
> + CPU_FOREACH(cpu) {
> + cpu->plugin_mem_cbs = NULL;
> + }
> +
This is essentially qemu_plugin_disable_mem_helpers() but for all CPUs.
I think we should be able to treat the CPUs separately.
> plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
> }
>
> Unfortunately, the workload/setup I have encountered this bug with are
> difficult to reproduce in a way suitable for sharing upstream (admittedly
> potentially because I do not fully understand the conditions necessary to
> trigger it). It is also deep into a run
How many full TB flushes have there been? You only see
qemu_plugin_flush_cb when we flush whole translation buffer (which is
something we do more often when plugins exit).
Does lowering tb-size make it easier to hit the failure mode?
> , and I haven't found a good way
> to break in gdb immediately prior to it happening in order to inspect
> it, without perturbing it enough such that it doesn't happen...
This is exactly the sort of thing rr is great for. Can you trigger it in
that?
https://rr-project.org/
>
> I welcome any feedback or insights on how to further nail down the
> failure case and/or help in working towards an appropriate solution.
>
> Thanks!
>
> -Aaron
--
Alex Bennée
Re: Plugin Memory Callback Debugging, Emilio Cota, 2022/11/16