qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 35/42] block: Fix check_to_replace_node()


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v6 35/42] block: Fix check_to_replace_node()
Date: Thu, 15 Aug 2019 15:21:38 +0000

09.08.2019 19:14, Max Reitz wrote:
> Currently, check_to_replace_node() only allows mirror to replace a node
> in the chain of the source node, and only if it is the first non-filter
> node below the source.  Well, technically, the idea is that you can
> exactly replace a quorum child by mirroring from quorum.
> 
> This has (probably) two reasons:
> (1) We do not want to create loops.
> (2) @replaces and @device should have exactly the same content so
>      replacing them does not cause visible data to change.
> 
> This has two issues:
> (1) It is overly restrictive.  It is completely fine for @replaces to be
>      a filter.
> (2) It is not restrictive enough.  You can create loops with this as
>      follows:
> 
> $ qemu-img create -f qcow2 /tmp/source.qcow2 64M
> $ qemu-system-x86_64 -qmp stdio
> {"execute": "qmp_capabilities"}
> {"execute": "object-add",
>   "arguments": {"qom-type": "throttle-group", "id": "tg0"}}
> {"execute": "blockdev-add",
>   "arguments": {
>       "node-name": "source",
>       "driver": "throttle",
>       "throttle-group": "tg0",
>       "file": {
>           "node-name": "filtered",
>           "driver": "qcow2",
>           "file": {
>               "driver": "file",
>               "filename": "/tmp/source.qcow2"
>           } } } }
> {"execute": "drive-mirror",
>   "arguments": {
>       "job-id": "mirror",
>       "device": "source",
>       "target": "/tmp/target.qcow2",
>       "format": "qcow2",
>       "node-name": "target",
>       "sync" :"none",
>       "replaces": "filtered"
>   } }
> {"execute": "block-job-complete", "arguments": {"device": "mirror"}}
> 
> And qemu crashes because of a stack overflow due to the loop being
> created (target's backing file is source, so when it replaces filtered,
> it points to itself through source).
> 
> (blockdev-mirror can be broken similarly.)
> 
> So let us make the checks for the two conditions above explicit, which
> makes the whole function exactly as restrictive as it needs to be.
> 
> Signed-off-by: Max Reitz <address@hidden>
> ---
>   include/block/block.h |  1 +
>   block.c               | 83 +++++++++++++++++++++++++++++++++++++++----
>   blockdev.c            | 34 ++++++++++++++++--
>   3 files changed, 110 insertions(+), 8 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 6ba853fb90..8da706cd89 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -404,6 +404,7 @@ bool bdrv_is_first_non_filter(BlockDriverState 
> *candidate);
>   
>   /* check if a named node can be replaced when doing drive-mirror */
>   BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
> +                                        BlockDriverState *backing_bs,
>                                           const char *node_name, Error 
> **errp);
>   
>   /* async block I/O */
> diff --git a/block.c b/block.c
> index 915b80153c..4858d3e718 100644
> --- a/block.c
> +++ b/block.c
> @@ -6290,7 +6290,59 @@ bool bdrv_is_first_non_filter(BlockDriverState 
> *candidate)
>       return false;
>   }
>   
> +static bool is_child_of(BlockDriverState *child, BlockDriverState *parent)
> +{
> +    BdrvChild *c;
> +
> +    if (!parent) {
> +        return false;
> +    }
> +
> +    QLIST_FOREACH(c, &parent->children, next) {
> +        if (c->bs == child || is_child_of(child, c->bs)) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
> +/*
> + * Return true if there are only filters in [@top, @base).  Note that
> + * this may include quorum (which bdrv_chain_contains() cannot
> + * handle).

More presizely: return true if exists chain of filters from top to base or if
top == base.

I keep in mind backup-top filter:

[backup-top]
|          \target
|backing    -------->[target]
V                    /
[source]  <---------/backing

> + */
> +static bool is_filtered_child(BlockDriverState *top, BlockDriverState *base)
> +{
> +    BdrvChild *c;
> +
> +    if (!top) {
> +        return false;
> +    }
> +
> +    if (top == base) {
> +        return true;
> +    }
> +
> +    if (!top->drv->is_filter) {
> +        return false;
> +    }
> +
> +    QLIST_FOREACH(c, &top->children, next) {
> +        if (is_filtered_child(c->bs, base)) {
> +            return true;
> +        }
> +    }

interesting, how much is it better to somehow reuse DFS search written in 
should_update_child()..
[just note, don't do it in these series please]

> +
> +    return false;
> +}
> +
> +/*
> + * @parent_bs is mirror's source BDS, @backing_bs is the BDS which
> + * will be attached to the target when mirror completes.
> + */
>   BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
> +                                        BlockDriverState *backing_bs,
>                                           const char *node_name, Error **errp)
>   {
>       BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
> @@ -6309,13 +6361,32 @@ BlockDriverState 
> *check_to_replace_node(BlockDriverState *parent_bs,
>           goto out;
>       }
>   
> -    /* We don't want arbitrary node of the BDS chain to be replaced only the 
> top
> -     * most non filter in order to prevent data corruption.
> -     * Another benefit is that this tests exclude backing files which are
> -     * blocked by the backing blockers.
> +    /*
> +     * If to_replace_bs is (recursively) a child of backing_bs,
> +     * replacing it may create a loop.  We cannot allow that.
>        */
> -    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
> -        error_setg(errp, "Only top most non filter can be replaced");
> +    if (to_replace_bs == backing_bs || is_child_of(to_replace_bs, 
> backing_bs)) {

first condition is covered by second, so first may be omitted.

> +        error_setg(errp, "Replacing this node would result in a loop");
> +        to_replace_bs = NULL;
> +        goto out;
> +    }
> +
> +    /*
> +     * Mirror is designed in such a way that when it completes, the
> +     * source BDS is seamlessly replaced.  

Not source but to_replace_bs is replaced?

> It is therefore not allowed
> +     * to replace a BDS where this condition would be violated, as that
> +     * would defeat the purpose of mirror and could lead to data
> +     * corruption.
> +     * Therefore, between parent_bs and to_replace_bs there may be
> +     * only filters (and the one on top must be a filter, too), so
> +     * their data always stays in sync and mirror can complete and
> +     * replace to_replace_bs without any possible corruptions.
> +     */
> +    if (!is_filtered_child(parent_bs, to_replace_bs) &&
> +        !is_filtered_child(to_replace_bs, parent_bs))
> +    {
> +        error_setg(errp, "The node to be replaced must be connected to the "
> +                   "source through filter nodes only");

"and the one on top must be a filter, too" not mentioned in the error..

>           to_replace_bs = NULL;
>           goto out;
>       }
> diff --git a/blockdev.c b/blockdev.c
> index 4e72f6f701..758e0b5431 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3887,7 +3887,7 @@ static void blockdev_mirror_common(const char *job_id, 
> BlockDriverState *bs,
>       }
>   
>       if (has_replaces) {
> -        BlockDriverState *to_replace_bs;
> +        BlockDriverState *to_replace_bs, *backing_bs;
>           AioContext *replace_aio_context;
>           int64_t bs_size, replace_size;
>   
> @@ -3897,7 +3897,37 @@ static void blockdev_mirror_common(const char *job_id, 
> BlockDriverState *bs,
>               return;
>           }
>   
> -        to_replace_bs = check_to_replace_node(bs, replaces, errp);
> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
> +        {
> +            /*
> +             * While we do not quite know what OPEN_BACKING_CHAIN
> +             * (used for mode=existing) will yield, it is probably
> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
> +             * because that is our best guess.
> +             */
> +            switch (sync) {
> +            case MIRROR_SYNC_MODE_FULL:
> +                backing_bs = NULL;
> +                break;
> +
> +            case MIRROR_SYNC_MODE_TOP:
> +                backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs));

why not  bdrv_backing_chain_next(bs) like in mirror_start?

> +                break;
> +
> +            case MIRROR_SYNC_MODE_NONE:
> +                backing_bs = bs;
> +                break;
> +
> +            default:
> +                abort();
> +            }
> +        } else {
> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
> +            backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(target));
> +        }
> +
> +        to_replace_bs = check_to_replace_node(bs, backing_bs, replaces, 
> errp);
>           if (!to_replace_bs) {
>               return;
>           }
> 


-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]