[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 27/64] block/nvme: Fix VFIO_MAP_DMA failed: No space left on devi
From: |
Michael Roth |
Subject: |
[PATCH 27/64] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device |
Date: |
Tue, 19 Oct 2021 09:09:07 -0500 |
From: Philippe Mathieu-Daudé <philmd@redhat.com>
When the NVMe block driver was introduced (see commit bdd6a90a9e5,
January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
-ENOMEM in case of error. The driver was correctly handling the
error path to recycle its volatile IOVA mappings.
To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
DMA mappings per container", April 2019) added the -ENOSPC error to
signal the user exhausted the DMA mappings available for a container.
The block driver started to mis-behave:
qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
(qemu)
(qemu) info status
VM status: paused (io-error)
(qemu) c
VFIO_MAP_DMA failed: No space left on device
(qemu) c
VFIO_MAP_DMA failed: No space left on device
(The VM is not resumable from here, hence stuck.)
Fix by handling the new -ENOSPC error (when DMA mappings are
exhausted) without any distinction to the current -ENOMEM error,
so we don't change the behavior on old kernels where the CVE-2019-3882
fix is not present.
An easy way to reproduce this bug is to restrict the DMA mapping
limit (65535 by default) when loading the VFIO IOMMU module:
# modprobe vfio_iommu_type1 dma_entry_limit=666
Cc: qemu-stable@nongnu.org
Cc: Fam Zheng <fam@euphon.net>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Michal Prívozník <mprivozn@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20210723195843.1032825-1-philmd@redhat.com
Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
Buglink: https://bugs.launchpad.net/qemu/+bug/1863333
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 15a730e7a3aaac180df72cd5730e0617bcf44a5a)
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
block/nvme.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/block/nvme.c b/block/nvme.c
index 2b5421e7aa..e8dbbc2317 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1030,7 +1030,29 @@ try_map:
r = qemu_vfio_dma_map(s->vfio,
qiov->iov[i].iov_base,
len, true, &iova);
+ if (r == -ENOSPC) {
+ /*
+ * In addition to the -ENOMEM error, the VFIO_IOMMU_MAP_DMA
+ * ioctl returns -ENOSPC to signal the user exhausted the DMA
+ * mappings available for a container since Linux kernel commit
+ * 492855939bdb ("vfio/type1: Limit DMA mappings per container",
+ * April 2019, see CVE-2019-3882).
+ *
+ * This block driver already handles this error path by checking
+ * for the -ENOMEM error, so we directly replace -ENOSPC by
+ * -ENOMEM. Beside, -ENOSPC has a specific meaning for blockdev
+ * coroutines: it triggers BLOCKDEV_ON_ERROR_ENOSPC and
+ * BLOCK_ERROR_ACTION_STOP which stops the VM, asking the operator
+ * to add more storage to the blockdev. Not something we can do
+ * easily with an IOMMU :)
+ */
+ r = -ENOMEM;
+ }
if (r == -ENOMEM && retry) {
+ /*
+ * We exhausted the DMA mappings available for our container:
+ * recycle the volatile IOVA mappings.
+ */
retry = false;
trace_nvme_dma_flush_queue_wait(s);
if (s->dma_map_count) {
--
2.25.1
- [PATCH 18/64] vhost-vdpa: don't initialize backend_features, (continued)
- [PATCH 18/64] vhost-vdpa: don't initialize backend_features, Michael Roth, 2021/10/19
- [PATCH 19/64] esp: only assert INTR_DC interrupt flag if selection fails, Michael Roth, 2021/10/19
- [PATCH 20/64] esp: only set ESP_RSEQ at the start of the select sequence, Michael Roth, 2021/10/19
- [PATCH 01/64] multi-process: Initialize variables declared with g_auto*, Michael Roth, 2021/10/19
- [PATCH 21/64] runstate: Initialize Error * to NULL, Michael Roth, 2021/10/19
- [PATCH 22/64] vfio: Fix unregister SaveVMHandler in vfio_migration_finalize, Michael Roth, 2021/10/19
- [PATCH 23/64] vl: Fix an assert failure in error path, Michael Roth, 2021/10/19
- [PATCH 24/64] tcg/sparc: Fix temp_allocate_frame vs sparc stack bias, Michael Roth, 2021/10/19
- [PATCH 25/64] tcg: Allocate sufficient storage in temp_allocate_frame, Michael Roth, 2021/10/19
- [PATCH 26/64] hw/pci-host/q35: Ignore write of reserved PCIEXBAR LENGTH field, Michael Roth, 2021/10/19
- [PATCH 27/64] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device,
Michael Roth <=
- [PATCH 28/64] crypto/tlscreds: Introduce qcrypto_tls_creds_check_endpoint() helper, Michael Roth, 2021/10/19
- [PATCH 29/64] block/nbd: Use qcrypto_tls_creds_check_endpoint(), Michael Roth, 2021/10/19
- [PATCH 30/64] qemu-nbd: Use qcrypto_tls_creds_check_endpoint(), Michael Roth, 2021/10/19
- [PATCH 02/64] linux-user/aarch64: Enable hwcap for RND, BTI, and MTE, Michael Roth, 2021/10/19
- [PATCH 31/64] chardev/socket: Use qcrypto_tls_creds_check_endpoint(), Michael Roth, 2021/10/19
- [PATCH 32/64] migration/tls: Use qcrypto_tls_creds_check_endpoint(), Michael Roth, 2021/10/19
- [PATCH 33/64] ui/vnc: Use qcrypto_tls_creds_check_endpoint(), Michael Roth, 2021/10/19
- [PATCH 34/64] crypto: Make QCryptoTLSCreds* structures private, Michael Roth, 2021/10/19
- [PATCH 35/64] yank: Unregister function when using TLS migration, Michael Roth, 2021/10/19
- [PATCH 36/64] tests: acpi: prepare for changing DSDT tables, Michael Roth, 2021/10/19