[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations
From: |
Andrey Drobyshev |
Subject: |
Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations |
Date: |
Tue, 19 Sep 2023 19:21:32 +0300 |
User-agent: |
Mozilla Thunderbird |
On 9/19/23 13:46, Hanna Czenczek wrote:
> On 15.09.23 18:20, Andrey Drobyshev wrote:
>> When rebasing an image from one backing file to another, we need to
>> compare data from old and new backings. If the diff between that data
>> happens to be unaligned to the target cluster size, we might end up
>> doing partial writes, which would lead to copy-on-write and additional
>> IO.
>>
>> Consider the following simple case (virtual_size == cluster_size == 64K):
>>
>> base <-- inc1 <-- inc2
>>
>> qemu-io -c "write -P 0xaa 0 32K" base.qcow2
>> qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
>> qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
>> qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
>> qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2
>>
>> While doing rebase, we'll write a half of the cluster to inc2, and block
>> layer will have to read the 2nd half of the same cluster from the base
>> image
>> inc1 while doing this write operation, although the whole cluster is
>> already
>> read earlier to perform data comparison.
>>
>> In order to avoid these unnecessary IO cycles, let's make sure every
>> write request is aligned to the overlay subcluster boundaries. Using
>> subcluster size is universal as for the images which don't have them
>> this size equals to the cluster size, so in any case we end up aligning
>> to the smallest unit of allocation.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> ---
>> qemu-img.c | 76 ++++++++++++++++++++++++++++++++++++++++--------------
>> 1 file changed, 56 insertions(+), 20 deletions(-)
>
> Looks good, I like the changes from v1! Two minor things:
>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index fcd31d7b5b..83950af42b 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>
> [...]
>
>> @@ -3844,33 +3861,48 @@ static int img_rebase(int argc, char **argv)
>> }
>> }
>> + /*
>> + * At this point we know that the region [offset; offset
>> + n)
>> + * is unallocated within the target image. This region
>> might be
>> + * unaligned to the target image's (sub)cluster
>> boundaries, as
>> + * old backing may have smaller clusters (or have
>> subclusters).
>> + * We extend it to the aligned boundaries to avoid CoW on
>> + * partial writes in blk_pwrite(),
>> + */
>> + n += offset - QEMU_ALIGN_DOWN(offset, write_align);
>> + offset = QEMU_ALIGN_DOWN(offset, write_align);
>> + n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n);
>> + n = MIN(n, size - offset);
>> + assert(!bdrv_is_allocated(unfiltered_bs, offset, n,
>> &n_alloc) &&
>> + n_alloc == n);
>> +
>> + /*
>> + * Much like the with the target image, we'll try to read
>> as much
>
> s/the with the/with the/
>
Noted.
>> + * of the old and new backings as we can.
>> + */
>> + n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
>> + if (blk_new_backing) {
>> + n_new = MIN(n, MAX(0, new_backing_size - (int64_t)
>> offset));
>> + }
>
> If we don’t have a check for blk_old_backing (old_backing_size is 0 if
> blk_old_backing is NULL), why do we have a check for blk_new_backing
> (new_backing_size is 0 if blk_new_backing is NULL)?
>
> (Perhaps because the previous check was `offset >= new_backing_size ||
> !blk_new_backing`, i.e. included exactly such a check – but I don’t
> think it’s necessary, new_backing_size will be 0 if blk_new_backing is
> NULL.)
>
>> +
>> /*
>> * Read old and new backing file and take into
>> consideration that
>> * backing files may be smaller than the COW image.
>> */
>> - if (offset >= old_backing_size) {
>> - memset(buf_old, 0, n);
>> - buf_old_is_zero = true;
>> + memset(buf_old + n_old, 0, n - n_old);
>> + if (!n_old) {
>> + old_backing_eof = true;
>> } else {
>> - if (offset + n > old_backing_size) {
>> - n = old_backing_size - offset;
>> - }
>> -
>> - ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
>> + ret = blk_pread(blk_old_backing, offset, n_old,
>> buf_old, 0);
>> if (ret < 0) {
>> error_report("error while reading from old
>> backing file");
>> goto out;
>> }
>> }
>> - if (offset >= new_backing_size || !blk_new_backing) {
>> - memset(buf_new, 0, n);
>> - } else {
>> - if (offset + n > new_backing_size) {
>> - n = new_backing_size - offset;
>> - }
>> -
>> - ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
>> + memset(buf_new + n_new, 0, n - n_new);
>> + if (blk_new_backing && n_new) {
>
> Same as above, I think we can drop the blk_new_backing check, just so
> that both blocks (for old and new) look the same.
>
> (Also, the memset() already has to trust that n_new is 0 if
> blk_new_backing is NULL, so the check doesn’t make much sense from that
> perspective either, and makes the memset() look wrong.)
>
Good point, thank you. Looks like we indeed can safely remove these
extra checks.
>> + ret = blk_pread(blk_new_backing, offset, n_new,
>> buf_new, 0);
>> if (ret < 0) {
>> error_report("error while reading from new
>> backing file");
>> goto out;
>
- [PATCH v2 1/8] qemu-img: rebase: stop when reaching EOF of old backing file, (continued)
- [PATCH v2 1/8] qemu-img: rebase: stop when reaching EOF of old backing file, Andrey Drobyshev, 2023/09/15
- [PATCH v2 8/8] iotests: add tests for "qemu-img rebase" with compression, Andrey Drobyshev, 2023/09/15
- [PATCH v2 4/8] qemu-img: add chunk size parameter to compare_buffers(), Andrey Drobyshev, 2023/09/15
- [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations, Andrey Drobyshev, 2023/09/15
- [PATCH v2 6/8] iotests/{024, 271}: add testcases for qemu-img rebase, Andrey Drobyshev, 2023/09/15
- [PATCH v2 3/8] qemu-img: rebase: use backing files' BlockBackend for buffer alignment, Andrey Drobyshev, 2023/09/15