qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queu


From: Eric Blake
Subject: Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queue_wakeup
Date: Mon, 25 Mar 2024 11:00:31 -0500
User-agent: NeoMutt/20240201

I've seen (and agree with) Stefan's reply that a more thorough audit
is needed, but am still providing a preliminary review based on what I
see here.

On Mon, Mar 25, 2024 at 05:18:50PM +0800, zhuyangyang via wrote:
> If g_main_loop_run()/aio_poll() is called in the coroutine context,
> the pending coroutine may be woken up repeatedly, and the co_queue_wakeup
> may be disordered.

coroutine context should not be doing anything that can block; you may
have uncovered a bigger bug that we need to address.

> 
> When the poll() syscall exited in g_main_loop_run()/aio_poll(), it means
> some listened events is completed. Therefore, the completion callback
> function is dispatched.
> 
> If this callback function needs to invoke aio_co_enter(), it will only
> wake up the coroutine (because we are already in coroutine context),
> which may cause that the data on this listening event_fd/socket_fd
> is not read/cleared. When the next poll () exits, it will be woken up again
> and inserted into the wakeup queue again.
> 
> For example, if TLS is enabled in NBD, the server will call g_main_loop_run()
> in the coroutine, and repeatedly wake up the io_read event on a socket.
> The call stack is as follows:
> 
> aio_co_enter()
> aio_co_wake()
> qio_channel_restart_read()
> aio_dispatch_handler()
> aio_dispatch_handlers()
> aio_dispatch()
> aio_ctx_dispatch()
> g_main_context_dispatch()
> g_main_loop_run()
> nbd_negotiate_handle_starttls()
> nbd_negotiate_options()
> nbd_negotiate()
> nbd_co_client_start()
> coroutine_trampoline()
> 
> Signed-off-by: zhuyangyang <zhuyangyang14@huawei.com>
> ---
>  util/async.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 0467890052..25fc1e6083 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -705,7 +705,18 @@ void aio_co_enter(AioContext *ctx, Coroutine *co)
>      if (qemu_in_coroutine()) {
>          Coroutine *self = qemu_coroutine_self();
>          assert(self != co);
> -        QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next);
> +        /*
> +         * If the Coroutine *co is already in the co_queue_wakeup, this
> +         * repeated insertion will causes the loss of other queue element

s/causes/cause/

> +         * or infinite loop.
> +         * For examplex:

s/examplex/example/

> +         * Head->a->b->c->NULL, after insert_tail(head, b) => 
> Head->a->b->NULL
> +         * Head->a-b>->NULL, after insert_tail(head, b) => Head->a->b->b...

s/b>->/b->/

> +         */
> +        if (!co->co_queue_next.sqe_next &&
> +            self->co_queue_wakeup.sqh_last != &co->co_queue_next.sqe_next) {
> +            QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next);
> +        }
>      } else {
>          qemu_aio_coroutine_enter(ctx, co);
>      }

Intuitively, attacking the symptoms (avoiding bogus list insertion
when it is already on the list) makes sense; but I want to make sure
we attack the root cause.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]