qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 35/42] block: Fix check_to_replace_node()


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v6 35/42] block: Fix check_to_replace_node()
Date: Thu, 15 Aug 2019 19:01:48 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 15.08.19 17:21, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 19:14, Max Reitz wrote:
>> Currently, check_to_replace_node() only allows mirror to replace a node
>> in the chain of the source node, and only if it is the first non-filter
>> node below the source.  Well, technically, the idea is that you can
>> exactly replace a quorum child by mirroring from quorum.
>>
>> This has (probably) two reasons:
>> (1) We do not want to create loops.
>> (2) @replaces and @device should have exactly the same content so
>>      replacing them does not cause visible data to change.
>>
>> This has two issues:
>> (1) It is overly restrictive.  It is completely fine for @replaces to be
>>      a filter.
>> (2) It is not restrictive enough.  You can create loops with this as
>>      follows:
>>
>> $ qemu-img create -f qcow2 /tmp/source.qcow2 64M
>> $ qemu-system-x86_64 -qmp stdio
>> {"execute": "qmp_capabilities"}
>> {"execute": "object-add",
>>   "arguments": {"qom-type": "throttle-group", "id": "tg0"}}
>> {"execute": "blockdev-add",
>>   "arguments": {
>>       "node-name": "source",
>>       "driver": "throttle",
>>       "throttle-group": "tg0",
>>       "file": {
>>           "node-name": "filtered",
>>           "driver": "qcow2",
>>           "file": {
>>               "driver": "file",
>>               "filename": "/tmp/source.qcow2"
>>           } } } }
>> {"execute": "drive-mirror",
>>   "arguments": {
>>       "job-id": "mirror",
>>       "device": "source",
>>       "target": "/tmp/target.qcow2",
>>       "format": "qcow2",
>>       "node-name": "target",
>>       "sync" :"none",
>>       "replaces": "filtered"
>>   } }
>> {"execute": "block-job-complete", "arguments": {"device": "mirror"}}
>>
>> And qemu crashes because of a stack overflow due to the loop being
>> created (target's backing file is source, so when it replaces filtered,
>> it points to itself through source).
>>
>> (blockdev-mirror can be broken similarly.)
>>
>> So let us make the checks for the two conditions above explicit, which
>> makes the whole function exactly as restrictive as it needs to be.
>>
>> Signed-off-by: Max Reitz <address@hidden>
>> ---
>>   include/block/block.h |  1 +
>>   block.c               | 83 +++++++++++++++++++++++++++++++++++++++----
>>   blockdev.c            | 34 ++++++++++++++++--
>>   3 files changed, 110 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 6ba853fb90..8da706cd89 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -404,6 +404,7 @@ bool bdrv_is_first_non_filter(BlockDriverState 
>> *candidate);
>>   
>>   /* check if a named node can be replaced when doing drive-mirror */
>>   BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>> +                                        BlockDriverState *backing_bs,
>>                                           const char *node_name, Error 
>> **errp);
>>   
>>   /* async block I/O */
>> diff --git a/block.c b/block.c
>> index 915b80153c..4858d3e718 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6290,7 +6290,59 @@ bool bdrv_is_first_non_filter(BlockDriverState 
>> *candidate)
>>       return false;
>>   }
>>   
>> +static bool is_child_of(BlockDriverState *child, BlockDriverState *parent)
>> +{
>> +    BdrvChild *c;
>> +
>> +    if (!parent) {
>> +        return false;
>> +    }
>> +
>> +    QLIST_FOREACH(c, &parent->children, next) {
>> +        if (c->bs == child || is_child_of(child, c->bs)) {
>> +            return true;
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +
>> +/*
>> + * Return true if there are only filters in [@top, @base).  Note that
>> + * this may include quorum (which bdrv_chain_contains() cannot
>> + * handle).
> 
> More presizely: return true if exists chain of filters from top to base or if
> top == base.
> 
> I keep in mind backup-top filter:
> 
> [backup-top]
> |          \target

backup-top can’t be a filter if it has two children with different
contents, though.

(commit-top and mirror-top aren’t filters either.)

That’s why there must be a unique chain [@top, @base).

I should probably not that it will return true if top == base, though, yes.

> |backing    -------->[target]
> V                    /
> [source]  <---------/backing
> 
>> + */
>> +static bool is_filtered_child(BlockDriverState *top, BlockDriverState *base)
>> +{
>> +    BdrvChild *c;
>> +
>> +    if (!top) {
>> +        return false;
>> +    }
>> +
>> +    if (top == base) {
>> +        return true;
>> +    }
>> +
>> +    if (!top->drv->is_filter) {
>> +        return false;
>> +    }
>> +
>> +    QLIST_FOREACH(c, &top->children, next) {
>> +        if (is_filtered_child(c->bs, base)) {
>> +            return true;
>> +        }
>> +    }
> 
> interesting, how much is it better to somehow reuse DFS search written in 
> should_update_child()..
> [just note, don't do it in these series please]
> 
>> +
>> +    return false;
>> +}
>> +
>> +/*
>> + * @parent_bs is mirror's source BDS, @backing_bs is the BDS which
>> + * will be attached to the target when mirror completes.
>> + */
>>   BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>> +                                        BlockDriverState *backing_bs,
>>                                           const char *node_name, Error 
>> **errp)
>>   {
>>       BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
>> @@ -6309,13 +6361,32 @@ BlockDriverState 
>> *check_to_replace_node(BlockDriverState *parent_bs,
>>           goto out;
>>       }
>>   
>> -    /* We don't want arbitrary node of the BDS chain to be replaced only 
>> the top
>> -     * most non filter in order to prevent data corruption.
>> -     * Another benefit is that this tests exclude backing files which are
>> -     * blocked by the backing blockers.
>> +    /*
>> +     * If to_replace_bs is (recursively) a child of backing_bs,
>> +     * replacing it may create a loop.  We cannot allow that.
>>        */
>> -    if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) {
>> -        error_setg(errp, "Only top most non filter can be replaced");
>> +    if (to_replace_bs == backing_bs || is_child_of(to_replace_bs, 
>> backing_bs)) {
> 
> first condition is covered by second, so first may be omitted.

It is not.  is_child_of() does not return true if child == parent.

>> +        error_setg(errp, "Replacing this node would result in a loop");
>> +        to_replace_bs = NULL;
>> +        goto out;
>> +    }
>> +
>> +    /*
>> +     * Mirror is designed in such a way that when it completes, the
>> +     * source BDS is seamlessly replaced.  
> 
> Not source but to_replace_bs is replaced?

It has originally been designed to replace the source.  If it could
replace any arbitrary BDS, all of this would be moot.

>> It is therefore not allowed
>> +     * to replace a BDS where this condition would be violated, as that
>> +     * would defeat the purpose of mirror and could lead to data
>> +     * corruption.
>> +     * Therefore, between parent_bs and to_replace_bs there may be
>> +     * only filters (and the one on top must be a filter, too), so
>> +     * their data always stays in sync and mirror can complete and
>> +     * replace to_replace_bs without any possible corruptions.
>> +     */
>> +    if (!is_filtered_child(parent_bs, to_replace_bs) &&
>> +        !is_filtered_child(to_replace_bs, parent_bs))
>> +    {
>> +        error_setg(errp, "The node to be replaced must be connected to the "
>> +                   "source through filter nodes only");
> 
> "and the one on top must be a filter, too" not mentioned in the error..

Well, unless the source node is the node to be replaced.  Hm...  This
gets very hard to express.  I think I’d prefer to keep this as it is,
even though it is not quite correct, unless you have a better suggestion
of what to report. :-/

>>           to_replace_bs = NULL;
>>           goto out;
>>       }
>> diff --git a/blockdev.c b/blockdev.c
>> index 4e72f6f701..758e0b5431 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3887,7 +3887,7 @@ static void blockdev_mirror_common(const char *job_id, 
>> BlockDriverState *bs,
>>       }
>>   
>>       if (has_replaces) {
>> -        BlockDriverState *to_replace_bs;
>> +        BlockDriverState *to_replace_bs, *backing_bs;
>>           AioContext *replace_aio_context;
>>           int64_t bs_size, replace_size;
>>   
>> @@ -3897,7 +3897,37 @@ static void blockdev_mirror_common(const char 
>> *job_id, BlockDriverState *bs,
>>               return;
>>           }
>>   
>> -        to_replace_bs = check_to_replace_node(bs, replaces, errp);
>> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
>> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
>> +        {
>> +            /*
>> +             * While we do not quite know what OPEN_BACKING_CHAIN
>> +             * (used for mode=existing) will yield, it is probably
>> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
>> +             * because that is our best guess.
>> +             */
>> +            switch (sync) {
>> +            case MIRROR_SYNC_MODE_FULL:
>> +                backing_bs = NULL;
>> +                break;
>> +
>> +            case MIRROR_SYNC_MODE_TOP:
>> +                backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs));
> 
> why not  bdrv_backing_chain_next(bs) like in mirror_start?

Good question.  I suppose it should be
bdrv_filtered_cow_bs(bdrv_backing_chain_next(bs)) in mirror_start()?
Because with sync=top, we just want to remove the topmost COW node (and
filters on top), but keep filters behind it.

Max

>> +                break;
>> +
>> +            case MIRROR_SYNC_MODE_NONE:
>> +                backing_bs = bs;
>> +                break;
>> +
>> +            default:
>> +                abort();
>> +            }
>> +        } else {
>> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
>> +            backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(target));
>> +        }
>> +
>> +        to_replace_bs = check_to_replace_node(bs, backing_bs, replaces, 
>> errp);
>>           if (!to_replace_bs) {
>>               return;
>>           }
>>
> 
> 


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]