qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 00/23] Add subcluster allocation to qcow2


From: Eric Blake
Subject: Re: [RFC PATCH 00/23] Add subcluster allocation to qcow2
Date: Tue, 15 Oct 2019 11:05:23 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0

On 10/15/19 10:23 AM, Alberto Garcia wrote:
Hi,

this series adds a new feature to the qcow2 on-disk format called
"Extended L2 Entries", which allows us to do subcluster allocation.

This cover letter explains the reasons behind this proposal, the
changes to the on-disk format, test results and pending work. If you
are curious you can also have a look at previous discussions about
this feature:


=== Changes to the on-disk format ===

An L2 entry is 64 bits wide, with this format (for uncompressed
clusters):

63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
**<----> <--------------------------------------------------><------->*
   Rsrved              host cluster offset of data             Reserved
   (6 bits)                (47 bits)                           (8 bits)

     bit 63: refcount == 1   (QCOW_OFLAG_COPIED)
     bit 62: compressed = 1  (QCOW_OFLAG_COMPRESSED)
     bit  0: all zeros       (QCOW_OFLAG_ZERO)

If Extended L2 Entries are enabled, bit 0 becomes reserved and must be
unset, and this 64-bit bitmap follows the entry:

63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<---------------------------------> <--------------------------------->
      subcluster reads as zeros            subcluster is allocated
              (32 bits)                           (32 bits)

I like the grouping - you can then do a 4-byte read and comparison to 0 to see if the entire cluster reads as zeroes or is unallocated.

With 32k clusters, this results in 1k subclusters. In cluster 1 (offset 32k), which bits map where? (The obvious choices are that sub-cluster 32k maps to bit 0, 33k maps to bit 1, ...; or that sub-cluster 32k maps to bit 31, 33k maps to bit 30, ...)

/me reads ahead

okay, in patch 5, you said you map the most significant bit to the first cluster. That feels backwards to me; I wonder if the math is any easier if you map sub-clusters starting from the least-significant, because then you get:

bit = (address >> cluster_size) & 32

rather than

bit = 31 - ((address >> cluster_size) & 32)


Some comments about the results:

- The smallest allowed cluster size for an image with subclusters is
   16 KB (in this case the subclusters size is 512 bytes), hence the
   missing values in the 4 KB and 8 KB rows.

Again reading ahead, I see that patch 5 requires a 16k minimum cluster for using extended L2. Could we still permit clusters smaller than that, but merely document that subclusters are always a minimum of 512 bytes and therefore for an 8k cluster we only use 16 bits (leaving the other 16 bits zero)? But I'm also fine with the simplicity of just stating that subclusters require at least 16k clusters.


=== To do ===

A couple of things are missing from this series:

- The ability to efficiently zero individual subclusters using
   qcow2_co_pwrite_zeroes(). At the moment only full clusters can be
   zeroed with this method.

- Alternatively we could get rid of the individual "all zeroes" bits
   altogether and have 64 subclusters per cluster. We would still have
   the QCOW_OFLAG_ZERO bit in the standard cluster descriptor.

I think you've got more flexibility with the two bits per sub-cluster than you would with just 1 bit and 64 subclusters, so I don't think this direction is going to get us far.


- The number of subclusters per cluster is always 32. It would be
   trivial to allow configuring this, but I don't see any use case.

Agreed.


- Tests: I have a few written that I'll add in future revisions of
   this series.

- handle_alloc_space() works at the subclusters level. That is, if you
   have an unallocated 2MB cluster with 64KB subclusters, no backing
   image and you write 4KB of data, QEMU won't write zeroes to the
   affected subcluster(s) and will use handle_alloc_space() instead.
   The other subclusters won't be touched and will remain unallocated.
   This behavior is consistent with how subclusters work and saves disk
   space, but offers slightly lower performance (see test results
   above). Theoretically we could offer a setting to configure this,
   but I'm not convinced that this is very useful.

===========================

As usual, feedback is welcome,

Looks promising!

How do subclusters interact with external data files?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]