[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 16/24] block: Mark bdrv_replace_node() GRAPH_WRLOCK
From: |
Eric Blake |
Subject: |
Re: [PATCH 16/24] block: Mark bdrv_replace_node() GRAPH_WRLOCK |
Date: |
Fri, 3 Nov 2023 07:37:27 -0500 |
User-agent: |
NeoMutt/20231023 |
On Fri, Nov 03, 2023 at 11:32:39AM +0100, Kevin Wolf wrote:
...
> > > - GLOBAL_STATE_CODE();
> > > -
> > > - /* Make sure that @from doesn't go away until we have successfully
> > > attached
> > > - * all of its parents to @to. */
> >
> > Useful comment that you just moved here in the previous patch...
> >
> > > - bdrv_ref(from);
...
> > > @@ -5717,9 +5699,15 @@ BlockDriverState
> > > *bdrv_insert_node(BlockDriverState *bs, QDict *options,
> > > goto fail;
> > > }
> > >
> > > + bdrv_ref(bs);
> >
> > ...but now it is gone. Intentional?
>
> I figured it was obvious enough that bdrv_ref() is always called to make
> sure that the node doesn't go away too early, but I can add it back.
Your call.
> > > @@ -94,8 +95,12 @@ static void commit_abort(Job *job)
> > > * XXX Can (or should) we somehow keep 'consistent read' blocked even
> > > * after the failed/cancelled commit job is gone? If we already wrote
> > > * something to base, the intermediate images aren't valid any more.
> > > */
> > > - bdrv_replace_node(s->commit_top_bs, s->commit_top_bs->backing->bs,
> > > - &error_abort);
> > > + commit_top_backing_bs = s->commit_top_bs->backing->bs;
> > > + bdrv_drained_begin(commit_top_backing_bs);
> > > + bdrv_graph_wrlock(commit_top_backing_bs);
> >
> > Here, and elsewhere in the patch, drained_begin/end is outside
> > wr(un)lock...
> >
> > > + bdrv_replace_node(s->commit_top_bs, commit_top_backing_bs,
> > > &error_abort);
> > > + bdrv_graph_wrunlock();
> > > + bdrv_drained_end(commit_top_backing_bs);
> > >
> > > bdrv_unref(s->commit_top_bs);
> > > bdrv_unref(top_bs);
> > > @@ -425,7 +430,11 @@ fail:
> > > /* commit_top_bs has to be replaced after deleting the block job,
> > > * otherwise this would fail because of lack of permissions. */
> > > if (commit_top_bs) {
> > > + bdrv_graph_wrlock(top);
> > > + bdrv_drained_begin(top);
> > > bdrv_replace_node(commit_top_bs, top, &error_abort);
> > > + bdrv_drained_end(top);
> > > + bdrv_graph_wrunlock();
> >
> > ...but here you do it in the opposite order. Intentional?
>
> No, this is actually wrong. bdrv_drained_begin() has a nested event
> loop, and running a nested event loop while holding the graph lock can
> cause deadlocks, so it's forbidden. Thanks for catching this!
That's what review is for!
>
> Since the two comments above are the only thing you found in the review,
> I'll just directly fix them while applying the series.
Sounds good to me. With the deadlock fixed by swapping order,
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org