qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] monitor/qmp: fix race with clients disconnecting early


From: Markus Armbruster
Subject: Re: [PATCH] monitor/qmp: fix race with clients disconnecting early
Date: Tue, 12 Oct 2021 11:27:26 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Stefan, any thoughts on this?

Markus Armbruster <armbru@redhat.com> writes:

> I've thought a bit more.
>
> A monitor can serve a series of clients.
>
> Back when all of the monitor ran in the main thread, we completely
> finished serving the current client before we started serving the next
> one (I think).  In other words, sessions did not overlap.
>
> Since we moved parts of the monitor to the monitor I/O thread (merge
> commit 4bdc24fa018), sessions can overlap, and this causes issues, as
> you demonstrated.
>
> Possible fixes:
>
> 1. Go back to "don't overlap" with suitable synchronization.
>
>    I'm afraid this won't cut it, because exec-oob would have wait in
>    line behind reconnect.
>
>    It currently waits for other reasons, but that feels fixable.  Going
>    back to "don't overlap" would make it unfixable.
>
> 2. Make the lingering session not affect / be affected by the new
>    session's state
>
>    This is what your patch tries.  Every access of session state needs
>    to be guarded like
>
>         if (conn_nr_before == qatomic_read(&mon->connection_nr)) {
>             access session state
>         } else {
>             ???
>         }
>
>    Issues:
>
>    * We have to find and guard all existing accesses.  That's work.
>
>    * We have to guard all future accesses.  More work, and easy to
>      forget, which makes this fix rather brittle.
>
>    * The fix is spread over many places.
>
>    * We may run into cases where the ??? part gets complicated.
>      Consider file descriptors.  The command in flight will have its
>      monitor_get_fd() fail after disconnect.  Probably okay, because it
>      can also fail for other reasons.  But we might run into cases where
>      the ??? part creates new failure modes for us to handle.
>
> 3. Per-session state
>
>    Move per-session state from monitor state into a separate object.
>
>    Use reference counts to keep this object alive until both threads are
>    done with the session.
>
>    Monitor I/O thread executes monitor core and the out-of-band
>    commands; its stops using the old session on disconnect, and starts
>    using the new session on connect.
>
>    Main thread executes in-band commands, and these use the session that
>    submitted them.
>
>    Do I make sense, or should I explain my idea in more detail?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]