qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 35/42] block: Fix check_to_replace_node()


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v6 35/42] block: Fix check_to_replace_node()
Date: Fri, 16 Aug 2019 11:01:04 +0000

15.08.2019 20:01, Max Reitz wrote:
> On 15.08.19 17:21, Vladimir Sementsov-Ogievskiy wrote:
>> 09.08.2019 19:14, Max Reitz wrote:
>>> Currently, check_to_replace_node() only allows mirror to replace a node
>>> in the chain of the source node, and only if it is the first non-filter
>>> node below the source.  Well, technically, the idea is that you can
>>> exactly replace a quorum child by mirroring from quorum.
>>>
>>> This has (probably) two reasons:
>>> (1) We do not want to create loops.
>>> (2) @replaces and @device should have exactly the same content so
>>>       replacing them does not cause visible data to change.
>>>
>>> This has two issues:
>>> (1) It is overly restrictive.  It is completely fine for @replaces to be
>>>       a filter.
>>> (2) It is not restrictive enough.  You can create loops with this as
>>>       follows:
>>>
>>> $ qemu-img create -f qcow2 /tmp/source.qcow2 64M
>>> $ qemu-system-x86_64 -qmp stdio
>>> {"execute": "qmp_capabilities"}
>>> {"execute": "object-add",
>>>    "arguments": {"qom-type": "throttle-group", "id": "tg0"}}
>>> {"execute": "blockdev-add",
>>>    "arguments": {
>>>        "node-name": "source",
>>>        "driver": "throttle",
>>>        "throttle-group": "tg0",
>>>        "file": {
>>>            "node-name": "filtered",
>>>            "driver": "qcow2",
>>>            "file": {
>>>                "driver": "file",
>>>                "filename": "/tmp/source.qcow2"
>>>            } } } }
>>> {"execute": "drive-mirror",
>>>    "arguments": {
>>>        "job-id": "mirror",
>>>        "device": "source",
>>>        "target": "/tmp/target.qcow2",
>>>        "format": "qcow2",
>>>        "node-name": "target",
>>>        "sync" :"none",
>>>        "replaces": "filtered"
>>>    } }
>>> {"execute": "block-job-complete", "arguments": {"device": "mirror"}}
>>>
>>> And qemu crashes because of a stack overflow due to the loop being
>>> created (target's backing file is source, so when it replaces filtered,
>>> it points to itself through source).
>>>
>>> (blockdev-mirror can be broken similarly.)
>>>
>>> So let us make the checks for the two conditions above explicit, which
>>> makes the whole function exactly as restrictive as it needs to be.
>>>
>>> Signed-off-by: Max Reitz <address@hidden>
>>> ---
>>>    include/block/block.h |  1 +
>>>    block.c               | 83 +++++++++++++++++++++++++++++++++++++++----
>>>    blockdev.c            | 34 ++++++++++++++++--
>>>    3 files changed, 110 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/include/block/block.h b/include/block/block.h
>>> index 6ba853fb90..8da706cd89 100644
>>> --- a/include/block/block.h
>>> +++ b/include/block/block.h
>>> @@ -404,6 +404,7 @@ bool bdrv_is_first_non_filter(BlockDriverState 
>>> *candidate);
>>>    
>>>    /* check if a named node can be replaced when doing drive-mirror */
>>>    BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>>> +                                        BlockDriverState *backing_bs,
>>>                                            const char *node_name, Error 
>>> **errp);
>>>    
>>>    /* async block I/O */
>>> diff --git a/block.c b/block.c
>>> index 915b80153c..4858d3e718 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -6290,7 +6290,59 @@ bool bdrv_is_first_non_filter(BlockDriverState 
>>> *candidate)
>>>        return false;
>>>    }
>>>    
>>> +static bool is_child_of(BlockDriverState *child, BlockDriverState *parent)
>>> +{
>>> +    BdrvChild *c;
>>> +
>>> +    if (!parent) {
>>> +        return false;
>>> +    }
>>> +
>>> +    QLIST_FOREACH(c, &parent->children, next) {
>>> +        if (c->bs == child || is_child_of(child, c->bs)) {
>>> +            return true;
>>> +        }
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>> +/*
>>> + * Return true if there are only filters in [@top, @base).  Note that
>>> + * this may include quorum (which bdrv_chain_contains() cannot
>>> + * handle).
>>
>> More presizely: return true if exists chain of filters from top to base or if
>> top == base.
>>
>> I keep in mind backup-top filter:
>>
>> [backup-top]
>> |          \target
> 
> backup-top can’t be a filter if it has two children with different
> contents, though.

Why? target is special child, unrelated to what is read/written over backup-top.
It's an own business of backup-top.

> 
> (commit-top and mirror-top aren’t filters either.)

Ahm, I missed something. They have is_filter = true and their children 
considered
to be filtered-rw children in your series? And than, who they are? Format nodes?
And how they appears in backing chains than?

> 
> That’s why there must be a unique chain [@top, @base).
> 
> I should probably not that it will return true if top == base, though, yes.
> 
>> |backing    -------->[target]
>> V                    /
>> [source]  <---------/backing
>>
>>> + */
>>> +static bool is_filtered_child(BlockDriverState *top, BlockDriverState 
>>> *base)
>>> +{
>>> +    BdrvChild *c;
>>> +
>>> +    if (!top) {
>>> +        return false;
>>> +    }
>>> +
>>> +    if (top == base) {
>>> +        return true;
>>> +    }
>>> +
>>> +    if (!top->drv->is_filter) {
>>> +        return false;
>>> +    }
>>> +
>>> +    QLIST_FOREACH(c, &top->children, next) {
>>> +        if (is_filtered_child(c->bs, base)) {
>>> +            return true;
>>> +        }
>>> +    }
>>
>> interesting, how much is it better to somehow reuse DFS search written in 
>> should_update_child()..
>> [just note, don't do it in these series please]
>>
>>> +
>>> +    return false;
>>> +}
>>> +
>>> +/*
>>> + * @parent_bs is mirror's source BDS, @backing_bs is the BDS which
>>> + * will be attached to the target when mirror completes.
>>> + */
>>>    BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>>> +                                        BlockDriverState *backing_bs,
>>>                                            const char *node_name, Error 
>>> **errp)
>>>    {
>>>        BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
>>> @@ -6309,13 +6361,32 @@ BlockDriverState 
>>> *check_to_replace_node(BlockDriverState *parent_bs,
>>>            goto out;
>>>        }
>>>    
>>> -    /* We don't want arbitrary node of the BDS chain to be replaced only 
>>> the top
>>> -     * most non filter in order to prevent data corruption.
>>> -     * Another benefit is that this tests exclude backing files which are
>>> -     * blocked by the backing blockers.
>>> +    /*
>>> +     * If to_replace_bs is (recursively) a child of backing_bs,
>>> +     * replacing it may create a loop.  We cannot allow that.
>>>         */
>>> -    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
>>> -        error_setg(errp, "Only top most non filter can be replaced");
>>> +    if (to_replace_bs == backing_bs || is_child_of(to_replace_bs, 
>>> backing_bs)) {
>>
>> first condition is covered by second, so first may be omitted.
> 
> It is not.  is_child_of() does not return true if child == parent.
> 
>>> +        error_setg(errp, "Replacing this node would result in a loop");
>>> +        to_replace_bs = NULL;
>>> +        goto out;
>>> +    }
>>> +
>>> +    /*
>>> +     * Mirror is designed in such a way that when it completes, the
>>> +     * source BDS is seamlessly replaced.
>>
>> Not source but to_replace_bs is replaced?
> 
> It has originally been designed to replace the source.  If it could
> replace any arbitrary BDS, all of this would be moot.

quorum child, you saying about in commit message?

> 
>>> It is therefore not allowed
>>> +     * to replace a BDS where this condition would be violated, as that
>>> +     * would defeat the purpose of mirror and could lead to data
>>> +     * corruption.
>>> +     * Therefore, between parent_bs and to_replace_bs there may be
>>> +     * only filters (and the one on top must be a filter, too), so
>>> +     * their data always stays in sync and mirror can complete and
>>> +     * replace to_replace_bs without any possible corruptions.
>>> +     */
>>> +    if (!is_filtered_child(parent_bs, to_replace_bs) &&
>>> +        !is_filtered_child(to_replace_bs, parent_bs))
>>> +    {
>>> +        error_setg(errp, "The node to be replaced must be connected to the 
>>> "
>>> +                   "source through filter nodes only");
>>
>> "and the one on top must be a filter, too" not mentioned in the error..
> 
> Well, unless the source node is the node to be replaced.  Hm...  This
> gets very hard to express.  I think I’d prefer to keep this as it is,
> even though it is not quite correct, unless you have a better suggestion
> of what to report. :-/

I can't imaging something better than just add "(and the one on top must be a 
filter, too)"

> 
>>>            to_replace_bs = NULL;
>>>            goto out;
>>>        }
>>> diff --git a/blockdev.c b/blockdev.c
>>> index 4e72f6f701..758e0b5431 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -3887,7 +3887,7 @@ static void blockdev_mirror_common(const char 
>>> *job_id, BlockDriverState *bs,
>>>        }
>>>    
>>>        if (has_replaces) {
>>> -        BlockDriverState *to_replace_bs;
>>> +        BlockDriverState *to_replace_bs, *backing_bs;
>>>            AioContext *replace_aio_context;
>>>            int64_t bs_size, replace_size;
>>>    
>>> @@ -3897,7 +3897,37 @@ static void blockdev_mirror_common(const char 
>>> *job_id, BlockDriverState *bs,
>>>                return;
>>>            }
>>>    
>>> -        to_replace_bs = check_to_replace_node(bs, replaces, errp);
>>> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
>>> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
>>> +        {
>>> +            /*
>>> +             * While we do not quite know what OPEN_BACKING_CHAIN
>>> +             * (used for mode=existing) will yield, it is probably
>>> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
>>> +             * because that is our best guess.
>>> +             */
>>> +            switch (sync) {
>>> +            case MIRROR_SYNC_MODE_FULL:
>>> +                backing_bs = NULL;
>>> +                break;
>>> +
>>> +            case MIRROR_SYNC_MODE_TOP:
>>> +                backing_bs = 
>>> bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs));
>>
>> why not  bdrv_backing_chain_next(bs) like in mirror_start?
> 
> Good question.  I suppose it should be
> bdrv_filtered_cow_bs(bdrv_backing_chain_next(bs)) in mirror_start()?

You mean bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)), I hope)

> Because with sync=top, we just want to remove the topmost COW node (and
> filters on top), but keep filters behind it.
> 

Agreed.

> 
>>> +                break;
>>> +
>>> +            case MIRROR_SYNC_MODE_NONE:
>>> +                backing_bs = bs;
>>> +                break;
>>> +
>>> +            default:
>>> +                abort();
>>> +            }
>>> +        } else {
>>> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
>>> +            backing_bs = 
>>> bdrv_filtered_cow_bs(bdrv_skip_rw_filters(target));
>>> +        }
>>> +
>>> +        to_replace_bs = check_to_replace_node(bs, backing_bs, replaces, 
>>> errp);
>>>            if (!to_replace_bs) {
>>>                return;
>>>            }
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]