qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 1/9] migration: Add switchover ack capability


From: Avihai Horon
Subject: Re: [PATCH v5 1/9] migration: Add switchover ack capability
Date: Mon, 19 Jun 2023 12:37:23 +0300
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0


On 15/06/2023 16:49, Cédric Le Goater wrote:
External email: Use caution opening links or attachments


On 6/15/23 14:38, YangHang Liu wrote:
Test in the following two scenarios:

[1]  Test scenario:  Both source VM and target VM (in listening mode)
have enabled return-path and switchover-ack capability:

Test result : The VFIO migration completed successfully

[2]  Test scenario :  The source VM has enabled return-path and
switchover-ack capability while the target VM (in listening mode) not

Test result : The VFIO migration fails

The detailed error thrown by qemu-kvm when VFIO migration fails:
     Target VM:
           0000:17:00.2: Received INIT_DATA_SENT but switchover ack is not used            error while loading state section id 81(0000:00:02.4:00.0/vfio)
           load of migration failed: Invalid argument
     Source VM:
            failed to save SaveStateEntry with id(name): 2(ram): -5
            Unable to write to socket: Connection reset by peer
            Unable to write to socket: Connection reset by peer

Tested-by: YangHang Liu <yanghliu@redhat.com>

Some more info,

Tests were performed with a mainline Linux and a mainline QEMU including
this series - patch8.

The amount of precopy data for a CX-7 VF is not very large. Any idea how
to generate some more initial state with such devices ?

I suppose pre-copy will be more important with vGPUs.

In CX-7 the precopy data is not expected to be very large, because it's mainly used to pre-allocate resources in the destination. However, precopy and switchover-ack are very important for CX-7, because they allow doing the resource pre-allocation in the destination when the source VM is running and reduce downtime significantly (see the example I gave in the cover letter).

Thanks.




YangHang,

Could you please reply with a Tested-by on the cover letter, so that the
whole series is tagged and not only patch 1.

Thanks,

C.



On Wed, May 31, 2023 at 1:46 AM Avihai Horon <avihaih@nvidia.com> wrote:

Migration downtime estimation is calculated based on bandwidth and
remaining migration data. This assumes that loading of migration data in
the destination takes a negligible amount of time and that downtime
depends only on network speed.

While this may be true for RAM, it's not necessarily true for other
migrated devices. For example, loading the data of a VFIO device in the
destination might require from the device to allocate resources, prepare
internal data structures and so on. These operations can take a
significant amount of time which can increase migration downtime.

This patch adds a new capability "switchover ack" that prevents the
source from stopping the VM and completing the migration until an ACK
is received from the destination that it's OK to do so.

This can be used by migrated devices in various ways to reduce downtime.
For example, a device can send initial precopy metadata to pre-allocate
resources in the destination and use this capability to make sure that
the pre-allocation is completed before the source VM is stopped, so it
will have full effect.

This new capability relies on the return path capability to communicate
from the destination back to the source.

The actual implementation of the capability will be added in the
following patches.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
---
  qapi/migration.json | 12 +++++++++++-
  migration/options.h |  1 +
  migration/options.c | 21 +++++++++++++++++++++
  3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 179af0c4d8..061ea512e0 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -487,6 +487,16 @@
  #     and should not affect the correctness of postcopy migration.
  #     (since 7.1)
  #
+# @switchover-ack: If enabled, migration will not stop the source VM
+#     and complete the migration until an ACK is received from the
+#     destination that it's OK to do so.  Exactly when this ACK is
+#     sent depends on the migrated devices that use this feature.
+#     For example, a device can use it to make sure some of its data
+#     is sent and loaded in the destination before doing switchover.
+#     This can reduce downtime if devices that support this capability
+#     are present.  'return-path' capability must be enabled to use
+#     it.  (since 8.1)
+#
  # Features:
  #
  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -502,7 +512,7 @@
             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
             'validate-uuid', 'background-snapshot',
-           'zero-copy-send', 'postcopy-preempt'] }
+           'zero-copy-send', 'postcopy-preempt', 'switchover-ack'] }

  ##
  # @MigrationCapabilityStatus:
diff --git a/migration/options.h b/migration/options.h
index 45991af3c2..9aaf363322 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -40,6 +40,7 @@ bool migrate_postcopy_ram(void);
  bool migrate_rdma_pin_all(void);
  bool migrate_release_ram(void);
  bool migrate_return_path(void);
+bool migrate_switchover_ack(void);
  bool migrate_validate_uuid(void);
  bool migrate_xbzrle(void);
  bool migrate_zero_blocks(void);
diff --git a/migration/options.c b/migration/options.c
index b62ab30cd5..16007afca6 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -185,6 +185,8 @@ Property migration_properties[] = {
      DEFINE_PROP_MIG_CAP("x-zero-copy-send",
              MIGRATION_CAPABILITY_ZERO_COPY_SEND),
  #endif
+    DEFINE_PROP_MIG_CAP("x-switchover-ack",
+                        MIGRATION_CAPABILITY_SWITCHOVER_ACK),

      DEFINE_PROP_END_OF_LIST(),
  };
@@ -308,6 +310,13 @@ bool migrate_return_path(void)
      return s->capabilities[MIGRATION_CAPABILITY_RETURN_PATH];
  }

+bool migrate_switchover_ack(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->capabilities[MIGRATION_CAPABILITY_SWITCHOVER_ACK];
+}
+
  bool migrate_validate_uuid(void)
  {
      MigrationState *s = migrate_get_current();
@@ -547,6 +556,18 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
          }
      }

+    if (new_caps[MIGRATION_CAPABILITY_SWITCHOVER_ACK]) {
+        if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) {
+            error_setg(errp, "Capability 'switchover-ack' requires capability "
+                             "'return-path'");
+            return false;
+        }
+
+        /* Disable this capability until it's implemented */
+        error_setg(errp, "'switchover-ack' is not implemented yet");
+        return false;
+    }
+
      return true;
  }

--
2.26.3







reply via email to

[Prev in Thread] Current Thread [Next in Thread]