|
From: | Yoonho Park |
Subject: | Re: qcow2 overlay performance |
Date: | Wed, 26 Aug 2020 14:14:11 -0400 |
Am 26.08.2020 um 02:46 hat Yoonho Park geschrieben:
> I have been measuring the performance of qcow2 overlays, and I am hoping to
> get some help in understanding the data I collected. In my experiments, I
> created a VM and attached a 16G qcow2 disk to it using "qemu-img create"
> and "virsh attach-disk". I use fio to fill it. I create some number of
> snapshots (overlays) using "virsh snapshot-create-as". To mimic user
> activity between taking snapshots, I use fio to randomly write to 10% of
> each overlay right after I create it. After creating the overlays, I use
> fio to measure random read performance and random write performance with 2
> different blocks sizes, 4K and 64K. 64K is the qcow2 cluster size used by
> the 16G qcow2 disk and the overlays (verified with "qemu-img info"). fio is
> using the attached disk as a block device to avoid as much file system
> overhead as possible. The VM, 16G disk, and snapshots (overlays) all reside
> on local disk. Below are the measurements I collected for up to 5 overlays.
>
>
> 4K blocks 64K blocks
>
> olays rd bw rd iops wr bw wr iops rd bw rd iops wr bw wr iops
>
> 0 4510 1127 438028 109507 67854 1060 521808 8153
>
> 1 4692 1173 2924 731 66801 1043 104297 1629
>
> 2 4524 1131 2781 695 66801 1043 104297 1629
>
> 3 4573 1143 3034 758 65500 1023 95627 1494
>
> 4 4556 1139 2971 742 67973 1062 108099 1689
>
> 5 4471 1117 2937 734 66615 1040 98472 1538
>
>
> Read performance is not affected by overlays. However, write performance
> drops even with a single overlay. My understanding is that writing 4K
> blocks requires a read-modify-write because you must fetch a complete
> cluster from deeper in the overlay chain before writing to the active
> overlay. However, this does not explain the drop in performance when
> writing 64K blocks. The performance drop is not as significant, but if the
> write block size matches the cluster size then it seems that there should
> not be any performance drop because the write can go directly to the active
> overlay.
Can you share the QEMU command line you used?
As you say, it is expected that layer 0 is a bit faster, however not to
this degree. My guess would be that you use the default cache mode
(which includes cache.direct=off), so your results are skewed because
the first requests will only write to memory (the host page cache) and
only later requests will actually hit the disk.
For benchmarking, you should always use cache.direct=on (or an alias
that contains it, such as cache=none).
> Another issue I hit is that I cannot set or change the cluster size of
> overlays. Is this possible with "virsh snapshot-create-as"?
That's a libvirt question. Peter, can you help?
> I am using qemu-system-x86_64 version 4.2.0 and virsh version 6.0.0.
>
>
> Thank you for any insights or advice you have.
Kevin
[Prev in Thread] | Current Thread | [Next in Thread] |