[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: backup_calculate_cluster_size does not consider source
From: |
Max Reitz |
Subject: |
Re: backup_calculate_cluster_size does not consider source |
Date: |
Wed, 6 Nov 2019 11:42:16 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 |
On 06.11.19 11:34, Wolfgang Bumiller wrote:
> On Wed, Nov 06, 2019 at 10:37:04AM +0100, Max Reitz wrote:
>> On 06.11.19 09:32, Stefan Hajnoczi wrote:
>>> On Tue, Nov 05, 2019 at 11:02:44AM +0100, Dietmar Maurer wrote:
>>>> Example: Backup from ceph disk (rbd_cache=false) to local disk:
>>>>
>>>> backup_calculate_cluster_size returns 64K (correct for my local .raw image)
>>>>
>>>> Then the backup job starts to read 64K blocks from ceph.
>>>>
>>>> But ceph always reads 4M block, so this is incredibly slow and produces
>>>> way too much network traffic.
>>>>
>>>> Why does backup_calculate_cluster_size does not consider the block size
>>>> from
>>>> the source disk?
>>>>
>>>> cluster_size = MAX(block_size_source, block_size_target)
>>
>> So Ceph always transmits 4 MB over the network, no matter what is
>> actually needed? That sounds, well, interesting.
>
> Or at least it generates that much I/O - in the end, it can slow down
> the backup by up to a multi-digit factor...
Oh, so I understand ceph internally resolves the 4 MB block and then
transmits the subcluster range. That makes sense.
>> backup_calculate_cluster_size() doesn’t consider the source size because
>> to my knowledge there is no other medium that behaves this way. So I
>> suppose the assumption was always that the block size of the source
>> doesn’t matter, because a partial read is always possible (without
>> having to read everything).
>
> Unless you enable qemu-side caching this only works until the
> block/cluster size of the source exceeds the one of the target.
>
>> What would make sense to me is to increase the buffer size in general.
>> I don’t think we need to copy clusters at a time, and
>> 0e2402452f1f2042923a5 has indeed increased the copy size to 1 MB for
>> backup writes that are triggered by guest writes. We haven’t yet
>> increased the copy size for background writes, though. We can do that,
>> of course. (And probably should.)
>>
>> The thing is, it just seems unnecessary to me to take the source cluster
>> size into account in general. It seems weird that a medium only allows
>> 4 MB reads, because, well, guests aren’t going to take that into account.
>
> But guests usually have a page cache, which is why in many setups qemu
> (and thereby the backup process) often doesn't.
But this still doesn’t make sense to me. Linux doesn’t issue 4 MB
requests to pre-fill the page cache, does it?
And if it issues a smaller request, there is no way for a guest device
to tell it “OK, here’s your data, but note we have a whole 4 MB chunk
around it, maybe you’d like to take that as well...?”
I understand wanting to increase the backup buffer size, but I don’t
quite understand why we’d want it to increase to the source cluster size
when the guest also has no idea what the source cluster size is.
Max
signature.asc
Description: OpenPGP digital signature
- backup_calculate_cluster_size does not consider source, Dietmar Maurer, 2019/11/05
- Re: backup_calculate_cluster_size does not consider source, Stefan Hajnoczi, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source, Max Reitz, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source, Wolfgang Bumiller, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source,
Max Reitz <=
- Re: backup_calculate_cluster_size does not consider source, Dietmar Maurer, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source, Max Reitz, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source, Max Reitz, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source, Dietmar Maurer, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source, Max Reitz, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source, Dietmar Maurer, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source, Max Reitz, 2019/11/06
- Re: backup_calculate_cluster_size does not consider source, Vladimir Sementsov-Ogievskiy, 2019/11/06