qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/3] Allow to enable multifd and postcopy migration togeth


From: Peter Xu
Subject: Re: [PATCH v2 0/3] Allow to enable multifd and postcopy migration together
Date: Fri, 29 Nov 2024 11:45:51 -0500

On Fri, Nov 29, 2024 at 05:52:53PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> Hello,
> 
> * Currently Multifd and Postcopy migration can not be used together.
>   QEMU shows "Postcopy is not yet compatible with multifd" message.
> 
>   When migrating guests with large (100's GB) RAM, Multifd threads
>   help to accelerate migration, but inability to use it with the
>   Postcopy mode delays guest start up on the destination side.
> 
> * This patch series allows to enable both Multifd and Postcopy
>   migration together. Precopy and Multifd threads work during
>   the initial guest (RAM) transfer. When migration moves to the
>   Postcopy phase, Multifd threads are restrained and the Postcopy
>   threads start to request pages from the source side.
> 
> * This series removes magic value (4-bytes) introduced in the
>   previous series for the Postcopy channel. And refactoring of
>   the 'ram_save_target_page' function is made independent of
>   the multifd & postcopy change.
> 
>   v1: 
> 20241126115748.118683-1-ppandit@redhat.com/T/#u">https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
>   v0: 
> https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u
> 
> 
> Thank you.
> ---
> Prasad Pandit (3):
>   migration/multifd: move macros to multifd header
>   migration: refactor ram_save_target_page functions
>   migration: enable multifd and postcopy together

Prasad,

I saw that there's still discussion in the previous version, while this
cover letter doesn't mention why it was ignored.  Especially, at least to
me, what Fabiano commented on simplifying the flush condition check makes
senes to me to be addressed.

Please cherish reviewer's feedback and time contributed, and let's finish
the disucssion first before rushing for a new version.  I'll try to join
the discussion later too there.

Meanwhile, before I read into any details, I found that all the tests I
requested are still missing.  Would you please consider adding them?

My previous comment regarding to test is here:

https://lore.kernel.org/qemu-devel/ZykJBq7ME5jgSzCA@x1n/

I listed exactly the minimum set of tests that I think we should have.

In general, any migration new feature must have both doc updates and test
coverage as long as applicable.

Multifd still has its doc missing, which is unfortunate.  We could have one
doc prior to this work, but it can also be done later.

OTOH on testing: this is not a new feature alone, but it's more dangerous
than a new feature due to what I mentioned before, that postcopy needs
atomicity on page movements, and multifd is completely against that from
design POV due to NIC drivers being able to modify guest pages directly.

It means multifd+postcopy bugs will be extremely hard to debug if we don't
put it right first.  So please be serious on the test coverage on this
work.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]