[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
From: |
Wolfgang Bumiller |
Subject: |
Re: [PATCH] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB |
Date: |
Mon, 22 Mar 2021 12:08:47 +0100 |
User-agent: |
NeoMutt/20180716 |
On Thu, Mar 18, 2021 at 02:35:50PM +0100, Stefan Reiter wrote:
> If OOB is disabled, events received in monitor_qmp_event will be handled
> in the main context. Thus, we must not acquire a qmp_queue_lock there,
> as the dispatcher coroutine holds one over a yield point, where it
> expects to be rescheduled from the main context. If a CHR_EVENT_CLOSED
> event is received just then, it can race and block the main thread by
> waiting on the queue lock.
>
> Run monitor_qmp_cleanup_queue_and_resume in a BH on the iohandler
> thread, so the main thread can always make progress during the
> reschedule.
>
> The delaying of the cleanup is safe, since the dispatcher always moves
> back to the iothread afterward, and thus the cleanup will happen before
> it gets to its next iteration.
>
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
This is a tough one. It *may* be fine, but I wonder if we can approach
this differently:
>From what I can gather we have the following call stacks & contexts:
Guarded lock (lock & release):
* monitor_qmp_cleanup_queue_and_resume
by monitor_qmp_event
by file handler (from I/O loop)
^ iohandler_context (assuming that's where the file handling happens...)
(after this patch as BH though)
* handle_qmp_command
a) by the json parser (which is also re-initialized by
monitor_qmp_event btw., haven't checked if that can also
"trigger" its methods immediately)
b) by monitor_qmp_read
by file handler (from I/O loop)
^ iohandler_context
Lock-"returning":
* monitor_qmp_requests_pop_any_with_lock
by coroutine_fn monitor_qmp_dispatcher_co
^ iohandler_context
Lock-releasing:
* coroutine_fn monitor_qmp_dispatcher_co
^ qemu_aio_context
The only *weird* thing that immediately pops out here is
`monitor_qmp_requests_pop_any_with_lock()` keeping a lock while
switching contexts.
This is done in order to allow `AIO_WAIT_WHILE` to work while making
progress on the events, but do we actually already need to be in this
context for the OOB `monitor_resume()` call or can we defer the context
switch to after having done that and released the lock?
`monitor_resume()` itself seems to simply schedule a BH which should
work regardless if I'm not mistaken. There's also a
`readline_show_prompt()` call, but that *looks* harmless?
`monitor_resume()` is also called without the lock later on, so even if
it needs to be in this context at that point for whatever reason, does
it need the lock?