qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Open qcow2 on multiple hosts simultaneously.


From: kvaps
Subject: Open qcow2 on multiple hosts simultaneously.
Date: Mon, 19 Jun 2023 19:20:34 +0200

Hi Kevin and the community,

I am designing a CSI driver for Kubernetes that allows efficient
utilization of SAN (Storage Area Network) and supports thin
provisioning, snapshots, and ReadWriteMany mode for block devices.

To implement this, I have explored several technologies such as
traditional LVM, LVMThin (which does not support shared mode), and
QCOW2 on top of block devices. This is the same approach to what oVirt
uses for thin provisioning over shared LUN:

https://github.com/oVirt/vdsm/blob/08a656c/doc/thin-provisioning.md

Based on benchmark results, I found that the performance degradation
of block-backed QCOW2 is much lower compared to LVM and LVMThin while
creating snapshots.

https://docs.google.com/spreadsheets/d/1mppSKhEevGl5ntBhZT3ccU5t07LwxXjQz1HM2uvBIuo/edit#gid=2020746352

Therefore, I have decided to use the same aproach for Kubernetes.

But in Kubernetes, the storage system needs to be self-sufficient and
not depended to the workload that uses it. Thus unlike oVirt, we have
no option to use the libvirt interface of the running VM to invoke the
live-migration. Instead, we should provide pure block device in
ReadWriteMany mode, where the block device can be writable on multiple
hosts simultaneously.

To achieve this, I decided to use the qemu-storage-daemon with the
VDUSE backend.

Other technologies, such as NBD and UBLK, were also considered, and
their benchmark results can be seen in the same document on the
different sheet:

https://docs.google.com/spreadsheets/d/1mppSKhEevGl5ntBhZT3ccU5t07LwxXjQz1HM2uvBIuo/edit#gid=416958126

Taking into account the performance, stability, and versatility, I
concluded that VDUSE is the optimal choice. To connect the device in
Kubernetes, the virtio-vdpa interface would be used, and the entire
scheme could look like this:


+---------------------+  +---------------------+
| node1               |  | node2               |
|                     |  |                     |
|    +-----------+    |  |    +-----------+    |
|    | /dev/vda  |    |  |    | /dev/vda  |    |
|    +-----+-----+    |  |    +-----+-----+    |
|          |          |  |          |          |
|     virtio-vdpa     |  |     virtio-vdpa     |
|          |          |  |          |          |
|        vduse        |  |        vduse        |
|          |          |  |          |          |
| qemu-storage-daemon |  | qemu-storage-daemon |
|          |          |  |          |          |
| +------- | -------+ |  | +------- | -------+ |
| | LUN    |        | |  | | LUN    |        | |
| |  +-----+-----+  | |  | |  +-----+-----+  | |
| |  | LV (qcow2)|  | |  | |  | LV (qcow2)|  | |
| |  +-----------+  | |  | |  +-----------+  | |
| +--------+--------+ |  | +--------+--------+ |
|          |          |  |          |          |
|          |          |  |          |          |
+--------- | ---------+  +--------- | ---------+
           |                        |
           |         +-----+        |
           +---------| SAN |--------+
                     +-----+

Despite two independent instances of qemu-storage-daemon for same
qcow2 disk running successfully on different hosts, I have concerns
about their proper functioning. Similar to live migration, I think
they should share the state between each other.

The question is how to make qemu-storage-daemon to share the state
between multiple nodes, or is qcow2 format inherently stateless and
does not requires this?

-- 
Best Regards,
Andrei Kvapil



reply via email to

[Prev in Thread] Current Thread [Next in Thread]