qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 1/9] block: add .bdrv_need_rw_file_child_duri


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v3 1/9] block: add .bdrv_need_rw_file_child_during_reopen_rw handler
Date: Fri, 2 Aug 2019 16:23:13 +0000

02.08.2019 18:42, Kevin Wolf wrote:
> Am 31.07.2019 um 14:09 hat Max Reitz geschrieben:
>> On 25.07.19 11:18, Vladimir Sementsov-Ogievskiy wrote:
>>> On reopen to rw parent may need rw access to child in .prepare, for
>>> example qcow2 needs to write IN_USE flags into stored bitmaps
>>> (currently it is done in a hacky way after commit and don't work).
>>> So, let's introduce such logic.
>>>
>>> The drawback is that in worst case bdrv_reopen_set_read_only may finish
>>> with error and in some intermediate state: some nodes reopened RW and
>>> some are not. But this is a way to fix bug around reopening qcow2
>>> bitmaps in the following commits.
>>
>> This commit message doesn’t really explain what this patch does.
>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>> ---
>>>   include/block/block_int.h |   2 +
>>>   block.c                   | 144 ++++++++++++++++++++++++++++++++++----
>>>   2 files changed, 133 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 3aa1e832a8..7bd6fd68dd 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -531,6 +531,8 @@ struct BlockDriver {
>>>                                uint64_t parent_perm, uint64_t parent_shared,
>>>                                uint64_t *nperm, uint64_t *nshared);
>>>   
>>> +     bool (*bdrv_need_rw_file_child_during_reopen_rw)(BlockDriverState 
>>> *bs);
>>> +
>>>       /**
>>>        * Bitmaps should be marked as 'IN_USE' in the image on reopening 
>>> image
>>>        * as rw. This handler should realize it. It also should unset 
>>> readonly
>>> diff --git a/block.c b/block.c
>>> index cbd8da5f3b..3c8e1c59b4 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -1715,10 +1715,12 @@ static void 
>>> bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
>>>                                        uint64_t *shared_perm);
>>>   
>>>   typedef struct BlockReopenQueueEntry {
>>> -     bool prepared;
>>> -     bool perms_checked;
>>> -     BDRVReopenState state;
>>> -     QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
>>> +    bool reopened_file_child_rw;
>>> +    bool changed_file_child_perm_rw;
>>> +    bool prepared;
>>> +    bool perms_checked;
>>> +    BDRVReopenState state;
>>> +    QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
>>>   } BlockReopenQueueEntry;
>>>   
>>>   /*
>>> @@ -3421,6 +3423,105 @@ BlockReopenQueue 
>>> *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
>>>                                      keep_old_opts);
>>>   }
>>>   
>>> +static int bdrv_reopen_set_read_only_drained(BlockDriverState *bs,
>>> +                                             bool read_only,
>>> +                                             Error **errp)
>>> +{
>>> +    BlockReopenQueue *queue;
>>> +    QDict *opts = qdict_new();
>>> +
>>> +    qdict_put_bool(opts, BDRV_OPT_READ_ONLY, read_only);
>>> +
>>> +    queue = bdrv_reopen_queue(NULL, bs, opts, true);
>>> +
>>> +    return bdrv_reopen_multiple(queue, errp);
>>> +}
>>> +
>>> +/*
>>> + * handle_recursive_reopens
>>> + *
>>> + * On fail it needs rollback_recursive_reopens to be called.
>>
>> It would be nice if this description actually said anything about what
>> the function is supposed to do.
>>
>>> + */
>>> +static int handle_recursive_reopens(BlockReopenQueueEntry *current,
>>> +                                    Error **errp)
>>> +{
>>> +    int ret;
>>> +    BlockDriverState *bs = current->state.bs;
>>> +
>>> +    /*
>>> +     * We use the fact that in reopen-queue children are always following
>>> +     * parents.
>>> +     * TODO: Switch BlockReopenQueue to be QTAILQ and use
>>> +     *       QTAILQ_FOREACH_REVERSE.
>>
>> Why don’t you do that first?  It would make the code more obvious at
>> least to me.
>>
>>> +     */
>>> +    if (QSIMPLEQ_NEXT(current, entry)) {
>>> +        ret = handle_recursive_reopens(QSIMPLEQ_NEXT(current, entry), 
>>> errp);
>>> +        if (ret < 0) {
>>> +            return ret;
>>> +        }
>>> +    }
>>> +
>>> +    if ((current->state.flags & BDRV_O_RDWR) && bs->file && bs->drv &&
>>> +        bs->drv->bdrv_need_rw_file_child_during_reopen_rw &&
>>> +        bs->drv->bdrv_need_rw_file_child_during_reopen_rw(bs))
>>> +    {
>>> +        if (!bdrv_is_writable(bs->file->bs)) {
>>> +            ret = bdrv_reopen_set_read_only_drained(bs->file->bs, false, 
>>> errp);
>>
>> Hm.  Sorry, I find this all a bit hard to understand.  (No comments and
>> all.)
>>
>> I understand that this is for an RO -> RW transition?  Everything is
>> still RO, but the parent will need an RW child before it transitions to
>> RW itself.
>>
>>
>> I’m going to be honest up front, I don’t like this very much.  But I
>> think it may be a reasonable solution for now.
>>
>> As I remember, the problem was that when reopening a qcow2 node from RO
>> to RW, we need to write something in .prepare() (because it can fail),
>> but naturally no .prepare() is called after any .commit(), so no matter
>> the order of nodes in the ReopenQueue, the child node will never be RW
>> by this point.
>>
>> Hm.  To me that mostly means that making the whole reopen process a
>> transaction was just a dream that turns out not to work.
> 
> This patch already looks somewhat complicated to me, and what you're
> proposing below sounds another order of magnitude more complex.

Agree :) However, at this point I've almost implemented it (it's not a reason
to chose more complex way, of course).

> 
> But I think the important point is your last sentence above. Once we
> acknowledge that we can't possibly maintain full transaction semantics,
> I'll ask this naive question: What prevents us from just keeping the
> additional write in .commit?

Hmm, it's what we've started with. The only thing to do is to reverse order
of commits, to commit file child before parent (and this way it works in
Virtuozzo now).
And I proposed it long ago:
https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg04528.html


> 
> It is expected to work in the common case, so we're only talking about
> the behaviour in error cases anyway. If something fails and we can't do
> things in a transactionable way, we need to decide what the surprise
> result will look like, because we can neither guarantee a rollback nor
> successful completion.
> 
> If the write fails unexpectedly, and we end up with a qcow2 image that
> is opened r/w, but has read-only bitmaps - wouldn't that be a reasonable
> result? It seems much easier to explain than some dependency subchain
> already being committed and the rest rolled back.

And this is how it works now (except it doesn't work because of commit order).
In our long ago conversation (link above) you pointed that the problem here
is that we don't return an error from actually failed commit and it is
ignored..

> 
> So, I guess my question is, what is the specifc scenario you're trying
> to fix with this series (why isn't the final patch a test case that
> would answer this question?), and are we sure that the cure isn't worse
> than the disease?
> 

Test appears at 03 and tests what already works, and lacks test-cases which
are broken, and they are added in 08 when all is fixed.

And here are two things to fix:
First is that we lose bitmaps on reopening to RO and it is described and
fixed in 06.
Second is that we cant set IN_USE flag when reopening to RW and it is fixed
finally in 08.


=====

So, if we decide to keep things simple, what to do? Just reorder commits to
satisfy dependencies, if any?

Should we add return code to commit, which should always be 0 except very rare
case?


-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]