qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental


From: Avihai Horon
Subject: Re: [PATCH 3/3] vfio/migration: Make VFIO migration non-experimental
Date: Tue, 27 Jun 2023 11:00:46 +0300
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0


On 26/06/2023 20:27, Alex Williamson wrote:
External email: Use caution opening links or attachments


On Mon, 26 Jun 2023 17:26:42 +0200
Cédric Le Goater <clg@redhat.com> wrote:

On 6/26/23 15:40, Joao Martins wrote:
On 26/06/2023 14:20, Cédric Le Goater wrote:
Hello Avihai,

On 6/26/23 10:23, Avihai Horon wrote:
The major parts of VFIO migration are supported today in QEMU. This
includes basic VFIO migration, device dirty page tracking and precopy
support.

Thus, at this point in time, it seems appropriate to make VFIO migration
non-experimental: remove the x prefix from enable_migration property,
change it to ON_OFF_AUTO and let the default value be AUTO.

In addition, make the following adjustments:
1. Require device dirty tracking support when enable_migration is AUTO
      (i.e., not explicitly enabled). This is because device dirty tracking
      is currently the only method to do dirty page tracking, which is
      essential for migrating in a reasonable downtime.
hmm, I don't think QEMU should decide to disable a feature for all
devices supposedly because it could be slow for some. That's too
restrictive. What about devices with have small states ? for which
the downtime would be reasonable even without device dirty tracking
support.

device dirty tracking refers to the ability to tracking dirty IOVA used by the
device which will DMA into RAM. It is required because the
consequence/alternative is to transfer all RAM in stop copy phase. Device state
size at that point is the least of our problems downtime wise.
Arg. thanks for reminding me. I tend to take this for granted ...
My initial reaction was the same, for instance we could have a non-DMA
device (ex. PCI serial card) that supports migration w/o dirty
tracking, but QEMU cannot assume the device doesn't do DMA, nor is it
worth our time to monitor whether bus-master is ever enabled for the
device, so QEMU needs to assume all memory that's DMA accessible for
the device is perpetually dirty.  Also, if such a corner case existed
for a non-DMA migratable device, the easier path is simply to require
dirty tracking stubs in the variant driver to report no memory dirtied.

I can imagine that allowing without dirty tracking is useful for developer
testing of the suspend/device-state flows, but as real default (auto) is very
questionable to let it go through without dirty tracking. When we have IOMMUFD
dirty tracking that's when we can relieve this restriction as a default.

But then note that (...)

Setting
      enable_migration to ON will not require device dirty tracking.
(...) this lets it ignore dirty tracking as you would like.


2. Make migration blocker messages more elaborate.
3. Remove error prints in vfio_migration_query_flags().
4. Remove a redundant assignment in vfio_migration_realize().

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
    include/hw/vfio/vfio-common.h |  2 +-
    hw/vfio/migration.c           | 29 ++++++++++++++++-------------
    hw/vfio/pci.c                 |  4 ++--
    3 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b4c28f318f..387eabde60 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -139,7 +139,7 @@ typedef struct VFIODevice {
        bool needs_reset;
        bool no_mmap;
        bool ram_block_discard_allowed;
-    bool enable_migration;
+    OnOffAuto enable_migration;
        VFIODeviceOps *ops;
        unsigned int num_irqs;
        unsigned int num_regions;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 79eb81dfd7..d8e0848635 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice
*vbasedev, uint64_t *mig_flags)
        feature->argsz = sizeof(buf);
        feature->flags = VFIO_DEVICE_FEATURE_GET | 
VFIO_DEVICE_FEATURE_MIGRATION;
        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
-        if (errno == ENOTTY) {
-            error_report("%s: VFIO migration is not supported in kernel",
-                         vbasedev->name);
-        } else {
-            error_report("%s: Failed to query VFIO migration support, err: %s",
-                         vbasedev->name, strerror(errno));
-        }
-
            return -errno;
        }
    @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void)
      int vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
    {
-    int ret = -ENOTSUP;
+    int ret;
    -    if (!vbasedev->enable_migration) {
+    if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) {
+        error_setg(&vbasedev->migration_blocker,
+                   "%s: Migration is disabled for VFIO device", 
vbasedev->name);
            goto add_blocker;
        }
          ret = vfio_migration_init(vbasedev);
        if (ret) {
It would be good to keep the message for 'errno == ENOTTY' as it was in
vfio_migration_query_flags(). When migration fails, it is an important
information to know that it is because the VFIO PCI host device driver
doesn't support the feature. The root cause could be deep below in FW or
how the VF was set up.

+1 As I have been in this rabbit hole

Sure, I will add it.


+        error_setg(&vbasedev->migration_blocker,
+                   "%s: Migration couldn't be initialized for VFIO device, "
+                   "err: %d (%s)",
+                   vbasedev->name, ret, strerror(-ret));
+        goto add_blocker;
+    }
+
+    if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO &&
+        !vbasedev->dirty_pages_supported) {
I don't agree with this test.

The alternative right now is perceptual dirty tracking. How is that OK as a
default? It's not like we have even an option :(

Maybe perhaps you refer to avoid strongly enforcing *always* it to allow testing
of the non dirty tracking parts? Maybe when you 'force' enabling with
enable-migration=on is when you ignore the dirty tracking which is what this is
doing.

I see ON_OFF_AUTO_ON as a way to abort the machine startup while
ON_OFF_AUTO_AUTO would let it run but block migration.
Agreed.  There's a little bit of redundancy between the device-level
"enable-migration=on" option and the global "-only-migratable" option
relative to preventing machine startup, but it also doesn't make sense
to me if the device-level option let realize complete successfully if
the device doesn't support or fails migration setup.  So I think we'd
generally rely on using the -only-migratable option with the default
ON_OFF_AUTO_AUTO value, allow the ON_OFF_AUTO_ON value to enable
dis-recommended support, and live with the redundancy that it should
also cause the device realize to fail if migration is not supported.
Thanks,

Alex

OK.

When enable_migration=AUTO we allow blockers.
When enable_migration=ON we don't allow blockers and instead prevent realization of VFIO device.

Regarding device dirty tracking, we keep current behavior, right?
That is:
When enable_migration=AUTO we block migration if device dirty tracking is not supported. When enable_migration=ON we allow migration even if device dirty tracking is not supported (in which case DMA-able memory will be perpetually dirtied).

Thanks.


We would
need an extra property to relax the checks, else we are hijacking
some pre-existing option to fit our need.

Since dirty tracking is a must-have to implement migration support
for any existing and future VFIO PCI variant driver, anything else
would be experimental code and we are trying to remove the flag !
Please correct me if I am wrong.

So, the case !vbasedev->dirty_pages_supported is just an extra
information to report for why migration is not supported. Does
that sound reasonable ?

Thanks,

C.



+        error_setg(&vbasedev->migration_blocker,
+                   "%s: VFIO device doesn't support device dirty tracking",
+                   vbasedev->name);
            goto add_blocker;
        }
I agree that with ON_OFF_AUTO_AUTO, errors at realize time should be recorded
in a migration blocker. What about the ON_OFF_AUTO_ON case ? If migration was
explicitly requested for the device and the conditions on the host are not met,
I think realize should fail and the machine abort.

+1 Good point

Thanks,

C.



@@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error
**errp)
        return 0;
      add_blocker:
-    error_setg(&vbasedev->migration_blocker,
-               "VFIO device doesn't support migration");
-
        ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
        if (ret < 0) {
            error_free(vbasedev->migration_blocker);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 73874a94de..48584e3b01 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = {
                        VFIO_FEATURE_ENABLE_REQ_BIT, true),
        DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
                        VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
-    DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
-                     vbasedev.enable_migration, false),
+    DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
+                            vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
        DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
        DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
                         vbasedev.ram_block_discard_allowed, false),



reply via email to

[Prev in Thread] Current Thread [Next in Thread]