qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test


From: Peter Xu
Subject: Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
Date: Thu, 1 Jun 2023 11:46:01 -0400

On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote:
> The migration test cases that actually exercise live migration want to
> ensure there is a minimum of two iterations of pre-copy, in order to
> exercise the dirty tracking code.
> 
> Historically we've queried the migration status, looking for the
> 'dirty-sync-count' value to increment to track iterations. This was
> not entirely reliable because often all the data would get transferred
> quickly enough that the migration would finish before we wanted it
> to. So we massively dropped the bandwidth and max downtime to
> guarantee non-convergance. This had the unfortunate side effect
> that every migration took at least 30 seconds to run (100 MB of
> dirty pages / 3 MB/sec).
> 
> This optimization takes a different approach to ensuring that a
> mimimum of two iterations. Rather than waiting for dirty-sync-count
> to increment, directly look for an indication that the source VM
> has dirtied RAM that has already been transferred.
> 
> On the source VM a magic marker is written just after the 3 MB
> offset. The destination VM is now montiored to detect when the
> magic marker is transferred. This gives a guarantee that the
> first 3 MB of memory have been transferred. Now the source VM
> memory is monitored at exactly the 3MB offset until we observe
> a flip in its value. This gives us a guaranteed that the guest
> workload has dirtied a byte that has already been transferred.
> 
> Since we're looking at a place that is only 3 MB from the start
> of memory, with the 3 MB/sec bandwidth, this test should complete
> in 1 second, instead of 30 seconds.
> 
> Once we've proved there is some dirty memory, migration can be
> set back to full speed for the remainder of the 1st iteration,
> and the entire of the second iteration at which point migration
> should be complete.
> 
> On a test machine this further reduces the migration test time
> from 8 minutes to 1 minute 40.

The outcome is definitely nice, but it does looks slightly hacky to me and
make the test slightly more complicated.

If it's all about making sure we finish the 1st iteration, can we simply
add a src qemu parameter "switchover-hold"?  If it's set, src never
switchover to dst but keeps the iterations.

Then migrate_ensure_non_converge() will be as simple as setting
switchover-hold to true.

I am even thinking whether there can even be real-life use case for that,
e.g., where a user might want to have a pre-heat of a migration of some VM,
and trigger it immediately when the admin really wants (the pre-heats moved
most of the pages and keep doing so).

It'll be also similar to what Avihai proposed here on switchover-ack, just
an ack mechanism on the src side:

https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]