qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 0/9] migration: Add switchover ack capability and VFIO pre


From: YangHang Liu
Subject: Re: [PATCH v5 0/9] migration: Add switchover ack capability and VFIO precopy support
Date: Fri, 16 Jun 2023 17:35:35 +0800

Tested-by: YangHang Liu <yanghliu@redhat.com>

On Wed, May 31, 2023 at 1:01 AM Avihai Horon <avihaih@nvidia.com> wrote:
>
> Hello everyone,
>
> This is v5 of the switchover ack series.
>
> Changes from v4 [6]:
> * Removed superfluous '"' in vfio_save_iterate() trace. (Cedric)
> * Removed VFIOMigration->switchover_ack_needed and computed it locally
>   when needed. (Cedric)
> * Added R-bs.
>
> Changes from v3 [5]:
> * Rebased on latest master branch.
> * Simplified switchover ack logic (call switchover_ack_needed only in
>   destination). (Peter)
> * Moved caching of VFIO migration flags to a separate patch. (Cedric)
> * Moved adding of x-allow-pre-copy property to a separate patch. (Cedric)
> * Reset VFIOMigration->precopy_{init,dirty}_size in vfio_query_precopy_size()
>   and in vfio_save_cleanup(). (Cedric)
> * Added a reference to VFIO uAPI in vfio_save_block() ENOMSG comment. (Cedric)
> * Added VFIOMigration->precopy_{init,dirty}_size to 
> trace_vfio_save_iterate(). (Cedric)
> * Adapted VFIO migration to switchover ack logic simplification:
>   - Checked migrate_switchover_ack() in vfio_{save,load}_setup() and set
>     VFIOMigration->switchover_ack_needed accordingly.
>   - vfio_switchover_ack_needed() doesn't set 
> VFIOMigration->switchover_ack_needed
>     and only returns its value.
> * Move VFIOMigration->switchover_ack_needed = false to 
> vfio_migration_cleanup()
>   so it will be set to false both in src and dest.
> * Fixed a few typos/coding style. (Peter/Cedric)
> * Added R-b/A-b (didn't add Cedric's R-b on patch #7 as switchover ack
>   changes in patch #2 introduced some changes to patch #7 as well).
>
> Changes from v2 [4]:
> * Rebased on latest master branch.
> * Changed the capability name to "switchover-ack" and the related
>   code/docs accordingly. (Peter)
> * Added a counter for the number of switchover ack users in the source
>   and used it to skip switchover ack if there are no users (instead of
>   setting the switchover acked flag to true). (Peter)
> * Added R-bs.
>
> Changes from v1 [3]:
> * Rebased on latest master branch.
> * Updated to latest QAPI doc comment conventions and refined
>   QAPI docs and capability error message. (Markus)
> * Followed Peter/Juan suggestion and removed the handshake between
>   source and destination.
>   Now the capability must be set on both source and destination.
>   Compatibility of this feature between different QEMU versions or
>   different host capabilities (i.e., kernel) is achieved in the regular
>   way of device properties and hw_comapt_x_y.
> * Replaced is_initial_data_active() and initial_data_loaded()
>   SaveVMHandlers handlers with a notification mechanism. (Peter)
> * Set the capability also in destination in the migration test.
> * Added VFIO device property x-allow-pre-copy to be able to preserve
>   compatibility between different QEMU versions or different host
>   capabilities (i.e., kernel).
> * Changed VFIO precopy initial data implementation according to the
>   above changes.
> * Documented VFIO precopy initial data support in VFIO migration
>   documentation.
> * Added R-bs.
>
> ===
>
> This series adds a new migration capability called "switchover ack". The
> purpose of this capability is to reduce migration downtime in cases
> where loading of migration data in the destination can take a lot of
> time, such as with VFIO migration data.
>
> The series then moves to add precopy support and switchover ack support
> for VFIO migration.
>
> Switchover ack is used by VFIO migration, but other migrated devices can
> add support for it and use it as well.
>
> === Background ===
>
> Migration downtime estimation is calculated based on bandwidth and
> remaining migration data. This assumes that loading of migration data in
> the destination takes a negligible amount of time and that downtime
> depends only on network speed.
>
> While this may be true for RAM, it's not necessarily true for other
> migrated devices. For example, loading the data of a VFIO device in the
> destination might require from the device to allocate resources and
> prepare internal data structures which can take a significant amount of
> time to do.
>
> This poses a problem, as the source may think that the remaining
> migration data is small enough to meet the downtime limit, so it will
> stop the VM and complete the migration, but in fact sending and loading
> the data in the destination may take longer than the downtime limit.
>
> To solve this, VFIO migration uAPI defines "initial bytes" as part of
> its precopy stream [1]. Initial bytes can be used in various ways to
> improve VFIO migration performance. For example, it can be used to
> transfer device metadata to pre-allocate resources in the destination.
> However, for this to work we need to make sure that all initial bytes
> are sent and loaded in the destination before the source VM is stopped.
>
> The new switchover ack migration capability helps us achieve this.
> It prevents the source from stopping the VM and completing the migration
> until an ACK is received from the destination that it's OK to do so.
> Thus, a VFIO device can make sure that its initial bytes were sent
> and loaded in the destination before the source VM is stopped.
>
> Note that this relies on the return path capability to communicate from
> the destination back to the source.
>
> === Flow of operation ===
>
> To use switchover ack, the capability must be enabled in both the source
> and the destination.
>
> During migration setup, migration code in the destination calls the
> switchover_ack_needed() SaveVMHandlers handler of the migrated devices
> to check if switchover ack is used by them.
> A "switchover_ack_pending_num" counter is increased for each migrated
> device that supports this feature. It will be used later to mark when an
> ACK should be sent to the source.
>
> Migration is active and the source starts to send precopy data as usual.
> In the destination, when a migrated device thinks it's OK to do
> switchover, it notifies the migration code about it and the
> "switchover_ack_pending_num" counter is decreased. For example, for a
> VFIO device it's when the device receives and loads its initial bytes.
>
> When the "switchover_ack_pending_num" counter reaches zero, it means
> that all devices agree to do switchover and an ACK is sent to the
> source, which will now be able to complete the migration when
> appropriate.
>
> === Test results ===
>
> The below table shows the downtime of two identical migrations. In the
> first migration swithcover ack is disabled and in the second it is
> enabled. The migrated VM is assigned with a mlx5 VFIO device which has
> 300MB of device data to be migrated.
>
> +----------------------+-----------------------+----------+
> |    Switchover ack    | VFIO device data size | Downtime |
> +----------------------+-----------------------+----------+
> |       Disabled       |         300MB         |  1900ms  |
> |       Enabled        |         300MB         |  420ms   |
> +----------------------+-----------------------+----------+
>
> Switchover ack gives a roughly 4.5 times improvement in downtime.
> The 1480ms difference is time that is used for resource allocation for
> the VFIO device in the destination. Without switchover ack, this time is
> spent when the source VM is stopped and thus the downtime is much
> higher. With switchover ack, the time is spent when the source VM is
> still running.
>
> === Patch breakdown ===
>
> - Patches 1-4 add the switchover ack capability.
> - Patches 5-8 add VFIO migration precopy support. Similar version of
>   them was previously sent here [2].
> - Patch 9 adds switchover ack support for VFIO migration.
>
> Thanks for reviewing!
>
> [1]
> https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/vfio.h#L1048
>
> [2]
> https://lore.kernel.org/qemu-devel/20230222174915.5647-3-avihaih@nvidia.com/
>
> [3]
> https://lore.kernel.org/qemu-devel/20230501140141.11743-1-avihaih@nvidia.com/
>
> [4]
> https://lore.kernel.org/qemu-devel/20230517155219.10691-1-avihaih@nvidia.com/
>
> [5]
> https://lore.kernel.org/qemu-devel/20230521151808.24804-1-avihaih@nvidia.com/
>
> [6]
> https://lore.kernel.org/qemu-devel/20230528140652.8693-1-avihaih@nvidia.com/
>
> Avihai Horon (9):
>   migration: Add switchover ack capability
>   migration: Implement switchover ack logic
>   migration: Enable switchover ack capability
>   tests: Add migration switchover ack capability test
>   vfio/migration: Refactor vfio_save_block() to return saved data size
>   vfio/migration: Store VFIO migration flags in VFIOMigration
>   vfio/migration: Add VFIO migration pre-copy support
>   vfio/migration: Add x-allow-pre-copy VFIO device property
>   vfio/migration: Add support for switchover ack capability
>
>  docs/devel/vfio-migration.rst |  45 +++++--
>  qapi/migration.json           |  12 +-
>  include/hw/vfio/vfio-common.h |   5 +
>  include/migration/register.h  |   2 +
>  migration/migration.h         |  14 +++
>  migration/options.h           |   1 +
>  migration/savevm.h            |   1 +
>  hw/core/machine.c             |   1 +
>  hw/vfio/common.c              |   6 +-
>  hw/vfio/migration.c           | 221 +++++++++++++++++++++++++++++++---
>  hw/vfio/pci.c                 |   2 +
>  migration/migration.c         |  32 ++++-
>  migration/options.c           |  17 +++
>  migration/savevm.c            |  54 +++++++++
>  tests/qtest/migration-test.c  |  26 ++++
>  hw/vfio/trace-events          |   4 +-
>  migration/trace-events        |   3 +
>  17 files changed, 413 insertions(+), 33 deletions(-)
>
> --
> 2.26.3
>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]