Re: [PATCH 3/3] block: Fix deadlock in bdrv_co_yield_to

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] block: Fix deadlock in bdrv_co_yield_to_drain()

From:	Kevin Wolf
Subject:	Re: [PATCH 3/3] block: Fix deadlock in bdrv_co_yield_to_drain()
Date:	Tue, 8 Dec 2020 18:37:20 +0100

Am 08.12.2020 um 16:33 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 03.12.2020 20:23, Kevin Wolf wrote:
> > If bdrv_co_yield_to_drain() is called for draining a block node that
> > runs in a different AioContext, it keeps that AioContext locked while it
> > yields and schedules a BH in the AioContext to do the actual drain.
> > 
> > As long as executing the BH is the very next thing the event loop of the
> 
> s/thing the event/thing in the event/
> 
> (I've reread several times to understand :)

Oops, thanks.

"...the next thing that the event loop _does_" is actually what I had in
mind.

> > node's AioContext, this actually happens to work, but when it tries to
> > execute something else that wants to take the AioContext lock, it will
> > deadlock. (In the bug report, this other thing is a virtio-scsi device
> > running virtio_scsi_data_plane_handle_cmd().)
> > 
> > Instead, always drop the AioContext lock across the yield and reacquire
> > it only when the coroutine is reentered. The BH needs to unconditionally
> > take the lock for itself now.
> > 
> > This fixes the 'block_resize' QMP command on a block node that runs in
> > an iothread.
> > 
> > Cc: qemu-stable@nongnu.org
> > Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> 
> I don't feel myself good enough in aio contexts acquiring and
> switching, to see any side effects. At least I don't see any obvious
> mistakes, so my weak:
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> Note, I looked through the callers:
> 
> bdrv_do_drained_begin/end should be ok, as their normal usage is to
> start/end drained section under acquired aio context, so it seems
> correct to temporary release the context. Still I didn't check all
> drained sections in the code.
> 
> bdrv_drain_all_begin seems OK too (we just wait until everything is
> drained, not bad to temporary release the lock). Still I don't see any
> call of it from coroutine context.

The good thing there is that BDRV_POLL_WHILE() drops the lock anyway, so
at least for all callers of bdrv_do_drained_begin() that pass poll=true,
we know that they are fine with releasing the lock temporarily.

There are two callers that pass false: The recursive call inside the
function itself, and bdrv_drain_all_begin(). We know that both will poll
later, so they always release the lock at least once.

For ending the drain section, there is bdrv_drained_end_no_poll(), which
is only called in bdrv_child_cb_drained_end(), i.e. an implementation of
BdrvChildClass.drained_end. This is only called recursively in the
context of a polling drain_end, which already drops the lock.

So I think we don't introduce any cases of dropping the lock where this
wouldn't have happened before.

Kevin

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 0/3] block: Fix block_resize deadlock with iothreads, Kevin Wolf, 2020/12/03
- [PATCH 1/3] block: Simplify qmp_block_resize() error paths, Kevin Wolf, 2020/12/03
  - Re: [PATCH 1/3] block: Simplify qmp_block_resize() error paths, Vladimir Sementsov-Ogievskiy, 2020/12/08
    - Re: [PATCH 1/3] block: Simplify qmp_block_resize() error paths, Kevin Wolf, 2020/12/08
- [PATCH 2/3] block: Fix locking in qmp_block_resize(), Kevin Wolf, 2020/12/03
  - Re: [PATCH 2/3] block: Fix locking in qmp_block_resize(), Vladimir Sementsov-Ogievskiy, 2020/12/08
    - Re: [PATCH 2/3] block: Fix locking in qmp_block_resize(), Kevin Wolf, 2020/12/08
- [PATCH 3/3] block: Fix deadlock in bdrv_co_yield_to_drain(), Kevin Wolf, 2020/12/03
  - Re: [PATCH 3/3] block: Fix deadlock in bdrv_co_yield_to_drain(), Vladimir Sementsov-Ogievskiy, 2020/12/08
    - Re: [PATCH 3/3] block: Fix deadlock in bdrv_co_yield_to_drain(), Kevin Wolf <=

Prev by Date: Re: [PATCH 0/4] block: prepare for 64bit
Next by Date: Re: [PATCH] target/arm: do not use cc->do_interrupt for KVM directly
Previous by thread: Re: [PATCH 3/3] block: Fix deadlock in bdrv_co_yield_to_drain()
Next by thread: [PATCH v15 0/4] Add Versal usb model
Index(es):
- Date
- Thread