qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations


From: Andrey Drobyshev
Subject: Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations
Date: Tue, 19 Sep 2023 19:21:32 +0300
User-agent: Mozilla Thunderbird

On 9/19/23 13:46, Hanna Czenczek wrote:
> On 15.09.23 18:20, Andrey Drobyshev wrote:
>> When rebasing an image from one backing file to another, we need to
>> compare data from old and new backings.  If the diff between that data
>> happens to be unaligned to the target cluster size, we might end up
>> doing partial writes, which would lead to copy-on-write and additional
>> IO.
>>
>> Consider the following simple case (virtual_size == cluster_size == 64K):
>>
>> base <-- inc1 <-- inc2
>>
>> qemu-io -c "write -P 0xaa 0 32K" base.qcow2
>> qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
>> qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
>> qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
>> qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2
>>
>> While doing rebase, we'll write a half of the cluster to inc2, and block
>> layer will have to read the 2nd half of the same cluster from the base
>> image
>> inc1 while doing this write operation, although the whole cluster is
>> already
>> read earlier to perform data comparison.
>>
>> In order to avoid these unnecessary IO cycles, let's make sure every
>> write request is aligned to the overlay subcluster boundaries.  Using
>> subcluster size is universal as for the images which don't have them
>> this size equals to the cluster size, so in any case we end up aligning
>> to the smallest unit of allocation.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> ---
>>   qemu-img.c | 76 ++++++++++++++++++++++++++++++++++++++++--------------
>>   1 file changed, 56 insertions(+), 20 deletions(-)
> 
> Looks good, I like the changes from v1!  Two minor things:
> 
>> diff --git a/qemu-img.c b/qemu-img.c
>> index fcd31d7b5b..83950af42b 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
> 
> [...]
> 
>> @@ -3844,33 +3861,48 @@ static int img_rebase(int argc, char **argv)
>>                   }
>>               }
>>   +            /*
>> +             * At this point we know that the region [offset; offset
>> + n)
>> +             * is unallocated within the target image.  This region
>> might be
>> +             * unaligned to the target image's (sub)cluster
>> boundaries, as
>> +             * old backing may have smaller clusters (or have
>> subclusters).
>> +             * We extend it to the aligned boundaries to avoid CoW on
>> +             * partial writes in blk_pwrite(),
>> +             */
>> +            n += offset - QEMU_ALIGN_DOWN(offset, write_align);
>> +            offset = QEMU_ALIGN_DOWN(offset, write_align);
>> +            n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n);
>> +            n = MIN(n, size - offset);
>> +            assert(!bdrv_is_allocated(unfiltered_bs, offset, n,
>> &n_alloc) &&
>> +                   n_alloc == n);
>> +
>> +            /*
>> +             * Much like the with the target image, we'll try to read
>> as much
> 
> s/the with the/with the/
>

Noted.

>> +             * of the old and new backings as we can.
>> +             */
>> +            n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
>> +            if (blk_new_backing) {
>> +                n_new = MIN(n, MAX(0, new_backing_size - (int64_t)
>> offset));
>> +            }
> 
> If we don’t have a check for blk_old_backing (old_backing_size is 0 if
> blk_old_backing is NULL), why do we have a check for blk_new_backing
> (new_backing_size is 0 if blk_new_backing is NULL)?
> 
> (Perhaps because the previous check was `offset >= new_backing_size ||
> !blk_new_backing`, i.e. included exactly such a check – but I don’t
> think it’s necessary, new_backing_size will be 0 if blk_new_backing is
> NULL.)
> 
>> +
>>               /*
>>                * Read old and new backing file and take into
>> consideration that
>>                * backing files may be smaller than the COW image.
>>                */
>> -            if (offset >= old_backing_size) {
>> -                memset(buf_old, 0, n);
>> -                buf_old_is_zero = true;
>> +            memset(buf_old + n_old, 0, n - n_old);
>> +            if (!n_old) {
>> +                old_backing_eof = true;
>>               } else {
>> -                if (offset + n > old_backing_size) {
>> -                    n = old_backing_size - offset;
>> -                }
>> -
>> -                ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
>> +                ret = blk_pread(blk_old_backing, offset, n_old,
>> buf_old, 0);
>>                   if (ret < 0) {
>>                       error_report("error while reading from old
>> backing file");
>>                       goto out;
>>                   }
>>               }
>>   -            if (offset >= new_backing_size || !blk_new_backing) {
>> -                memset(buf_new, 0, n);
>> -            } else {
>> -                if (offset + n > new_backing_size) {
>> -                    n = new_backing_size - offset;
>> -                }
>> -
>> -                ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
>> +            memset(buf_new + n_new, 0, n - n_new);
>> +            if (blk_new_backing && n_new) {
> 
> Same as above, I think we can drop the blk_new_backing check, just so
> that both blocks (for old and new) look the same.
> 
> (Also, the memset() already has to trust that n_new is 0 if
> blk_new_backing is NULL, so the check doesn’t make much sense from that
> perspective either, and makes the memset() look wrong.)
>

Good point, thank you.  Looks like we indeed can safely remove these
extra checks.

>> +                ret = blk_pread(blk_new_backing, offset, n_new,
>> buf_new, 0);
>>                   if (ret < 0) {
>>                       error_report("error while reading from new
>> backing file");
>>                       goto out;
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]