qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Re-evaluating subcluster allocation for qcow2 ima


From: Kevin Wolf
Subject: Re: [Qemu-devel] [RFC] Re-evaluating subcluster allocation for qcow2 images
Date: Thu, 11 Jul 2019 16:32:34 +0200
User-agent: Mutt/1.11.3 (2019-02-01)

Am 11.07.2019 um 16:08 hat Alberto Garcia geschrieben:
> Some questions that are still open:
> 
> - It is possible to configure very easily the number of subclusters per
>   cluster. It is now hardcoded to 32 in qcow2_do_open() but any power of
>   2 would work (just change the number there if you want to test
>   it). Would an option for this be worth adding?

I think for testing we can just change the constant. Once th feature is
merged and used in production, I don't think there is any reason to
leave bits unused.

> - We could also allow the user to choose 64 subclusters per cluster and
>   disable the "all zeroes" bits in that case. It is quite simple in
>   terms of lines of code but it would make the qcow2 spec a bit more
>   complicated.
> 
> - We would now have "all zeroes" bits at the cluster and subcluster
>   levels, so there's an ambiguity here that we need to solve. In
>   particular, what happens if we have a QCOW2_CLUSTER_ZERO_ALLOC cluster
>   but some bits from the bitmap are set? Do we ignore them completely?

The (super)cluster zero bit should probably always be clear if
subclusters are used. If it's set, we have a corrupted image.

> I also ran some I/O tests using a similar scenario like last time (SSD
> drive, 40GB backing image). Here are the results, you can see the
> difference between the previous prototype (8 subclusters per cluster)
> and the new one (32):

Is the 8 subclusters test run with the old version (64 bit L2 entries)
or the new version (128 bit L2 entries) with bits left unused?

> |--------------+----------------+---------------+-----------------|
> | Cluster size | 32 subclusters | 8 subclusters | subclusters=off |
> |--------------+----------------+---------------+-----------------|
> |         4 KB |        80 IOPS |      101 IOPS |         92 IOPS |
> |         8 KB |       108 IOPS |      299 IOPS |        417 IOPS |
> |        16 KB |      3440 IOPS |     7555 IOPS |       3347 IOPS |
> |        32 KB |     10718 IOPS |    13038 IOPS |       2435 IOPS |
> |        64 KB |     12569 IOPS |    10613 IOPS |       1622 IOPS |
> |       128 KB |     11444 IOPS |     4907 IOPS |        866 IOPS |
> |       256 KB |      9335 IOPS |     2618 IOPS |        561 IOPS |
> |       512 KB |       185 IOPS |     1678 IOPS |        353 IOPS |
> |      1024 KB |      2477 IOPS |      863 IOPS |        212 IOPS |
> |      2048 KB |      1536 IOPS |      571 IOPS |        123 IOPS |
> |--------------+----------------+---------------+-----------------|
> 
> I'm surprised about the 256 KB cluster / 32 subclusters case (I would
> expect ~3300 IOPS), but I ran it a few times and the results are always
> the same. I still haven't investigated why that happens. The rest of the
> results seem more or less normal.

Shouldn't 256k/8k perform similarly to 64k/8k, or maybe a bit better?
Why did you expect ~3300 IOPS?

I found other results more surprising. In particular:

* Why does 64k/2k perform better than 128k/4k when the block size for
  your requests is 4k?

* Why is the maximum for 8 subclusters higher than for 32 subclusters?
  I guess this does make some sense if the 8 subclusters case actually
  used 64 bit L2 entries. If you did use 128 bit entries for both 32 and
  8 subclusters, I don't see why 8 subclusters should perform better in
  any case.

* What causes the minimum at 512k with 32 subclusters? The other two
  setups have a maximum and performance decreases monotonically to both
  sides. This one has a minimum at 512k and larger cluster sizes improve
  performance again.

  In fact, 512k performs really bad compared even to subclusters=off.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]