qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_boun


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
Date: Tue, 13 Aug 2019 17:02:19 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 13.08.19 17:00, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 17:57, Max Reitz wrote:
>> On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.08.2019 17:23, Max Reitz wrote:
>>>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 12.08.2019 19:11, Max Reitz wrote:
>>>>>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>>>>>
>>>>>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>>>>>> at the point where we need the buffer for the first time.
>>>>>>>>>>
>>>>>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>>>>>>>>> ---
>>>>>>>>>>      block/backup.c | 65 
>>>>>>>>>> +++++++++++++++++++++++++++++++-------------------
>>>>>>>>>>      1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>>>>>> index d482d93458..65f7212c85 100644
>>>>>>>>>> --- a/block/backup.c
>>>>>>>>>> +++ b/block/backup.c
>>>>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>>>>      #include "qemu/error-report.h"
>>>>>>>>>>      #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>>>>>      typedef struct CowRequest {
>>>>>>>>>>          int64_t start_byte;
>>>>>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>>>>>          qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>>>>>      }
>>>>>>>>>> -/* Copy range to target with a bounce buffer and return the bytes 
>>>>>>>>>> copied. If
>>>>>>>>>> - * error occurred, return a negative error number */
>>>>>>>>>> +/*
>>>>>>>>>> + * Copy range to target with a bounce buffer and return the bytes 
>>>>>>>>>> copied. If
>>>>>>>>>> + * error occurred, return a negative error number
>>>>>>>>>> + *
>>>>>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>>>>>
>>>>>>>>> s/to/to be/
>>>>>>>>>
>>>>>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>>>>>> + */
>>>>>>>>>>      static int coroutine_fn 
>>>>>>>>>> backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>>>>>                                                            int64_t 
>>>>>>>>>> start,
>>>>>>>>>>                                                            int64_t 
>>>>>>>>>> end,
>>>>>>>>>>                                                            bool 
>>>>>>>>>> is_write_notifier,
>>>>>>>>>>                                                            bool 
>>>>>>>>>> *error_is_read,
>>>>>>>>>> -                                                      void 
>>>>>>>>>> **bounce_buffer)
>>>>>>>>>> +                                                      void 
>>>>>>>>>> *bounce_buffer)
>>>>>>>>>>      {
>>>>>>>>>>          int ret;
>>>>>>>>>>          BlockBackend *blk = job->common.blk;
>>>>>>>>>> -    int nbytes;
>>>>>>>>>> +    int nbytes, remaining_bytes;
>>>>>>>>>>          int read_flags = is_write_notifier ? 
>>>>>>>>>> BDRV_REQ_NO_SERIALISING : 0;
>>>>>>>>>>          assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, 
>>>>>>>>>> job->cluster_size);
>>>>>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>>>>>> -    if (!*bounce_buffer) {
>>>>>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>>>>>> -    }
>>>>>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, 
>>>>>>>>>> read_flags);
>>>>>>>>>> -    if (ret < 0) {
>>>>>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>>> -        if (error_is_read) {
>>>>>>>>>> -            *error_is_read = true;
>>>>>>>>>> +
>>>>>>>>>> +    remaining_bytes = nbytes;
>>>>>>>>>> +    while (remaining_bytes) {
>>>>>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>>>>>> +
>>>>>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, 
>>>>>>>>>> read_flags);
>>>>>>>>>> +        if (ret < 0) {
>>>>>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>>> +            if (error_is_read) {
>>>>>>>>>> +                *error_is_read = true;
>>>>>>>>>> +            }
>>>>>>>>>> +            goto fail;
>>>>>>>>>>              }
>>>>>>>>>> -        goto fail;
>>>>>>>>>> -    }
>>>>>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>>>>>> -                        job->write_flags);
>>>>>>>>>> -    if (ret < 0) {
>>>>>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>>> -        if (error_is_read) {
>>>>>>>>>> -            *error_is_read = false;
>>>>>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, 
>>>>>>>>>> bounce_buffer,
>>>>>>>>>> +                            job->write_flags);
>>>>>>>>>> +        if (ret < 0) {
>>>>>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>>> +            if (error_is_read) {
>>>>>>>>>> +                *error_is_read = false;
>>>>>>>>>> +            }
>>>>>>>>>> +            goto fail;
>>>>>>>>>>              }
>>>>>>>>>> -        goto fail;
>>>>>>>>>> +
>>>>>>>>>> +        start += chunk;
>>>>>>>>>> +        remaining_bytes -= chunk;
>>>>>>>>>>          }
>>>>>>>>>>          return nbytes;
>>>>>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn 
>>>>>>>>>> backup_do_cow(BackupBlockJob *job,
>>>>>>>>>>                  }
>>>>>>>>>>              }
>>>>>>>>>>              if (!job->use_copy_range) {
>>>>>>>>>> +            if (!bounce_buffer) {
>>>>>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>>>>>> +                                 MAX(dirty_end - start, end - 
>>>>>>>>>> dirty_end));
>>>>>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, 
>>>>>>>>>> len);
>>>>>>>>>> +            }
>>>>>>>>>
>>>>>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>>>>>
>>>>>>>> Oops, you are right, of course.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>>>>>> once per job (the first time we get here, probably) to be of size
>>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe 
>>>>>>>>> add
>>>>>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>>>>>
>>>>>>>>
>>>>>>>> Once per job will not work, as we may have several guest writes in 
>>>>>>>> parallel and therefore
>>>>>>>> several parallel copy-before-write operations.
>>>>>>>
>>>>>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>>>>>> large discards in parallel, this means that qemu will allocate a large
>>>>>>> amount of memory.
>>>>>>>
>>>>>>> It would be nice if there was a simple way to keep track of the total
>>>>>>> memory usage and let requests yield if they would exceed it.
>>>>>>
>>>>>> Agree, it should be fixed anyway.
>>>>>>
>>>>>
>>>>>
>>>>> But still..
>>>>>
>>>>> Synchronous mirror allocates full-request buffers on guest write. Is it 
>>>>> correct?
>>>>>
>>>>> If we assume that it is correct to double memory usage of guest 
>>>>> operations, than for backup
>>>>> the problem is only in write_zero and discard where guest-assumed memory 
>>>>> usage should be zero.
>>>>
>>>> Well, but that is the problem.  I didn’t say anything in v2, because I
>>>> only thought of normal writes and I found it fine to double the memory
>>>> usage there (a guest won’t issue huge write requests in parallel).  But
>>>> discard/write-zeroes are a different matter.
>>>>
>>>>> And if we should distinguish writes from write_zeroes and discard, it's 
>>>>> better to postpone this
>>>>> improvement to be after backup-top filter merged.
>>>>
>>>> But do you need to distinguish it?  Why not just keep track of memory
>>>> usage and put the current I/O coroutine to sleep in a CoQueue or
>>>> something, and wake that up at the end of backup_do_cow()?
>>>>
>>>
>>> 1. Because if we _can_ allow doubling of memory, it's more effective to not 
>>> restrict allocations on
>>> guest writes. It's just seems to be more effective technique.
>>
>> But the problem with backup and zero writes/discards is that the memory
>> is not doubled.  The request doesn’t need any memory, but the CBW
>> operation does, and maybe lots of it.
>>
>> So the guest may issue many zero writes/discards in parallel and thus
>> exhaust memory on the host.
> 
> So this is the reason to separate writes from write-zeros/discrads. So at 
> least write will be happy. And I
> think that write is more often request than write-zero/discard

But that makes it complicated for no practical gain whatsoever.

>>
>>> 2. Anyway, I'd allow some always-available size to allocate - let it be one 
>>> cluster, which will correspond
>>> to current behavior and prevent guest io hang in worst case.
>>
>> The guest would only hang if it we have to copy more than e.g. 64 MB at
>> a time.  At which point I think it’s not unreasonable to sequentialize
>> requests.

Because of this.  How is it bad to start sequentializing writes when the
data exceeds 64 MB?

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]