[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH v9 13/13] block/backup: use backup-top instead o
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [Qemu-block] [PATCH v9 13/13] block/backup: use backup-top instead of write notifiers |
Date: |
Thu, 29 Aug 2019 14:55:06 +0000 |
28.08.2019 22:50, Max Reitz wrote:
> On 26.08.19 18:13, Vladimir Sementsov-Ogievskiy wrote:
>> Drop write notifiers and use filter node instead.
>>
>> = Changes =
>>
>> 1. add filter-node-name argument for backup qmp api. We have to do it
>> in this commit, as 257 needs to be fixed.
>
> I feel a bit bad about it not being an implicit node. But I know you
> don’t like them, they’re probably just broken altogether and because
> libvirt doesn’t use backup (yet), probably nobody cares.
>
>> 2. there no move write notifiers here, so is_write_notifier parameter
>
> s/there/there are/, I suppose?
>
>> is dropped from block-copy paths.
>>
>> 3. Intersecting requests handling changed, now synchronization between
>> backup-top, backup and guest writes are all done in block/block-copy.c
>> and works as follows:
>>
>> On copy operation, we work only with dirty areas. If bits are dirty it
>> means that there are no requests intersecting with this area. We clear
>> dirty bits and take bdrv range lock (bdrv_co_try_lock) on this area to
>> prevent further operations from interaction with guest (only with
>> guest, as neither backup nor backup-top will touch non-dirty area). If
>> copy-operation failed we set dirty bits back together with releasing
>> the lock.
>>
>> The actual difference with old scheme is that on guest writes we
>> don't lock the whole region but only dirty-parts, and to be more
>> precise: only dirty-part we are currently operate on. In old scheme
>> guest write to non-dirty area (which may be safely ignored by backup)
>> may wait for intersecting request, touching some other area which is
>> dirty.
>>
>> 4. To sync with in-flight requests at job finish we now have drained
>> removing of the filter, we don't need rw-lock.
>>
>> = Notes =
>>
>> Note the consequence of three objects appearing: backup-top, backup job
>> and block-copy-state:
>>
>> 1. We want to insert backup-top before job creation, to behave similar
>> with mirror and commit, where job is started upon filter.
>>
>> 2. We also have to create block-copy-state after filter injection, as
>> we don't want it's source child be replaced by fitler. Instead we want
>
> s/it's/its/, s/filter/filter/ (although “fitler” does have an amusing
> ring to it)
>
>> to keep BCS.source to be real source node, as we want to use
>> bdrv_co_try_lock in CBW operations and it can't be used on filter, as
>> on filter we already have in-flight (write) request from upper layer.
>
> Reasonable, even more so as I suppose BCS.source should eventually be a
> BdrvChild of backup-top.
>
> What looks wrong is that the sync_bitmap is created on the source (“bs”
> in backup_job_create()), but backup_cleanup_sync_bitmap() still assumes
> it is on blk_bs(job->common.blk) (which is no longer true).
>
>> So, we firstly create inject backup-top, then create job and BCS. BCS
>> is the latest just to not create extra variable for it. Finally we set
>> bcs for backup-top filter.
>>
>> = Iotest changes =
>>
>> 56: op-blocker doesn't shot now, as we set it on source, but then check
>
> s/shot/show/?
>
>> on filter, when trying to start second backup, so error caught in
>> test_dismiss_collision is changed. It's OK anyway, as this test-case
>> seems to test that after some collision we can dismiss first job and
>> successfully start the second one. So, the change is that collision is
>> changed from op-blocker to file-posix locks.
>
> Well, but the op blocker belongs to the source, which should be covered
> by internal permissions. The fact that it now shows up as a file-posix
> error thus shows that the conflict also moves from the source to the
> target. It’s OK because we actually don’t have a conflict on the source.
>
> But I wonder whether it would be better for test_dismiss_collision() to
> do a blockdev-backup instead where we can see the collision on the target.
>
> Hm. On second thought, why do we even get a conflict on the target?
> block-copy does share the WRITE permission for it...
Not sure, but assume that this is because in file-posix.c in raw_co_create
we do want RESIZE perm.
I can instead move this test to use specified job-id, to move the collision
to "job-id already exists" error. Is it better?
I'm afraid that posix locks will not work if disable them in config.
>
>> However, it's obvious now that we'd better drop this op-blocker at all
>> and add a test-case for two backups from one node (to different
>> destinations) actually works. But not in these series.
>>
>> 257: The test wants to emulate guest write during backup. They should
>> go to filter node, not to original source node, of course. Therefore we
>> need to specify filter node name and use it.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>> ---
>> qapi/block-core.json | 8 +-
>> include/block/block-copy.h | 2 +-
>> include/block/block_int.h | 1 +
>> block/backup-top.c | 14 +-
>> block/backup.c | 113 +++-----------
>> block/block-copy.c | 55 ++++---
>> block/replication.c | 2 +-
>> blockdev.c | 1 +
>> tests/qemu-iotests/056 | 2 +-
>> tests/qemu-iotests/257 | 7 +-
>> tests/qemu-iotests/257.out | 306 ++++++++++++++++++-------------------
>> 11 files changed, 230 insertions(+), 281 deletions(-)
>
> Nice.
>
> I don’t have much to object (for some reason, I feel a bit uneasy about
> dropping all the request waiting code, but I suppose that is indeed
> taken care of with the range locks now).
>
> [...]
>
>> diff --git a/block/backup.c b/block/backup.c
>> index d927c63e5a..259a165405 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>
> [...]
>
>> @@ -552,6 +480,9 @@ BlockJob *backup_job_create(const char *job_id,
>> BlockDriverState *bs,
>> backup_clean(&job->common.job);
>> job_early_fail(&job->common.job);
>> }
actually, here should be "else if"
>> + if (backup_top) {
>> + bdrv_backup_top_drop(backup_top);
>> + }
>
> This order is weird because backup-top still has a pointer to the BCS,
> but we have already freed it at this point.
>
>>
>> return NULL;
>> }
>> diff --git a/block/block-copy.c b/block/block-copy.c
>> index 6828c46ba0..f3102ec3ff 100644
>> --- a/block/block-copy.c
>> +++ b/block/block-copy.c
>> @@ -98,27 +98,32 @@ fail:
>> * error occurred, return a negative error number
>> */
>> static int coroutine_fn block_copy_with_bounce_buffer(
>> - BlockCopyState *s, int64_t start, int64_t end, bool
>> is_write_notifier,
>> + BlockCopyState *s, int64_t start, int64_t end,
>> bool *error_is_read, void **bounce_buffer)
>> {
>> int ret;
>> - int nbytes;
>> - int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>> + int nbytes = MIN(s->cluster_size, s->len - start);
>> + void *lock = bdrv_co_try_lock(blk_bs(s->source), start, nbytes);
>> +
>> + /*
>> + * Function must be called only on full-dirty region, so nobody
>> touching or
>> + * touched these bytes. Therefore, we must successfully get lock.
>
> Which is OK because backup-top does not share the WRITE permission. But
> it is organized a bit weirdly, because it’s actually block-copy here
> that relies on the WRITE permission not to be shared; which it itself
> cannot do because backup-top needs it.
>
> This of course results from the fact that backup-top and block-copy
> should really use the same object to access the source, namely the
> backing BdrvChild of backup-top.
>
> Maybe there should be a comment somewhere that points this out?
But I wanted block-copy not to be directly connected to backup-top.
I think actually we rely on the fact that user of block-copy() guarantees that
nobody is touching dirty areas (except block_copy). And with backup, this is
done by
backup-top filter. I'll add a comment on this.
>
>> + */
>> + assert(lock);
>
> [...]
>
>> @@ -128,13 +133,16 @@ static int coroutine_fn block_copy_with_bounce_buffer(
>> if (error_is_read) {
>> *error_is_read = false;
>> }
>> - goto fail;
>> + goto out;
>> }
>>
>> - return nbytes;
>> -fail:
>> - bdrv_set_dirty_bitmap(s->copy_bitmap, start, s->cluster_size);
>> - return ret;
>> +out:
>> + if (ret < 0) {
>> + bdrv_set_dirty_bitmap(s->copy_bitmap, start, s->cluster_size);
>> + }
>> + bdrv_co_unlock(lock);
>> +
>> + return ret < 0 ? ret : nbytes;
>>
>
> I feel like it would still be simpler to keep the separate fail path and
> just duplicate the bdrv_co_unlock(lock) call.
>
> [...]
>
>> diff --git a/blockdev.c b/blockdev.c
>> index fbef6845c8..f89e48fc79 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3601,6 +3601,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
>> job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
>> backup->sync, bmap, backup->bitmap_mode,
>> backup->compress,
>> + backup->filter_node_name,
>
> Is this guaranteed to be NULL when not specified by the user?
>
As far as I know - yes, it is.
--
Best regards,
Vladimir
- [Qemu-block] [PATCH v9 09/13] iotests: 257: drop device_add, (continued)
- [Qemu-block] [PATCH v9 09/13] iotests: 257: drop device_add, Vladimir Sementsov-Ogievskiy, 2019/08/26
- [Qemu-block] [PATCH v9 03/13] block/backup: introduce BlockCopyState, Vladimir Sementsov-Ogievskiy, 2019/08/26
- [Qemu-block] [PATCH v9 05/13] block: move block_copy from block/backup.c to separate file, Vladimir Sementsov-Ogievskiy, 2019/08/26
- [Qemu-block] [PATCH v9 13/13] block/backup: use backup-top instead of write notifiers, Vladimir Sementsov-Ogievskiy, 2019/08/26
- [Qemu-block] [PATCH v9 08/13] iotests: 257: drop unused Drive.device field, Vladimir Sementsov-Ogievskiy, 2019/08/26
- [Qemu-block] [PATCH v9 06/13] block: teach bdrv_debug_breakpoint skip filters with backing, Vladimir Sementsov-Ogievskiy, 2019/08/26
- [Qemu-block] [PATCH v9 07/13] iotests: prepare 124 and 257 bitmap querying for backup-top filter, Vladimir Sementsov-Ogievskiy, 2019/08/26