qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO


From: Daniel P . Berrangé
Subject: Re: [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer
Date: Wed, 17 Apr 2024 17:35:39 +0100
User-agent: Mutt/2.2.12 (2023-09-09)

On Wed, Apr 17, 2024 at 02:11:37PM +0200, Maciej S. Szmigiero wrote:
> On 17.04.2024 10:36, Daniel P. Berrangé wrote:
> > On Tue, Apr 16, 2024 at 04:42:39PM +0200, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> > > 
> > > VFIO device state transfer is currently done via the main migration 
> > > channel.
> > > This means that transfers from multiple VFIO devices are done sequentially
> > > and via just a single common migration channel.
> > > 
> > > Such way of transferring VFIO device state migration data reduces
> > > performance and severally impacts the migration downtime (~50%) for VMs
> > > that have multiple such devices with large state size - see the test
> > > results below.
> > > 
> > > However, we already have a way to transfer migration data using multiple
> > > connections - that's what multifd channels are.
> > > 
> > > Unfortunately, multifd channels are currently utilized for RAM transfer
> > > only.
> > > This patch set adds a new framework allowing their use for device state
> > > transfer too.
> > > 
> > > The wire protocol is based on Avihai's x-channel-header patches, which
> > > introduce a header for migration channels that allow the migration source
> > > to explicitly indicate the migration channel type without having the
> > > target deduce the channel type by peeking in the channel's content.
> > > 
> > > The new wire protocol can be switch on and off via 
> > > migration.x-channel-header
> > > option for compatibility with older QEMU versions and testing.
> > > Switching the new wire protocol off also disables device state transfer 
> > > via
> > > multifd channels.
> > > 
> > > The device state transfer can happen either via the same multifd channels
> > > as RAM data is transferred, mixed with RAM data (when
> > > migration.x-multifd-channels-device-state is 0) or exclusively via
> > > dedicated device state transfer channels (when
> > > migration.x-multifd-channels-device-state > 0).
> > > 
> > > Using dedicated device state transfer multifd channels brings further
> > > performance benefits since these channels don't need to participate in
> > > the RAM sync process.
> > 
> > I'm not convinced there's any need to introduce the new "channel header"
> > protocol messages. The multifd channels already have an initialization
> > message that is extensible to allow extra semantics to be indicated.
> > So if we want some of the multifd channels to be reserved for device
> > state, we could indicate that via some data in the MultiFDInit_t
> > message struct.
> 
> The reason for introducing x-channel-header was to avoid having to deduce
> the channel type by peeking in the channel's content - where any channel
> that does not start with QEMU_VM_FILE_MAGIC is currently treated as a
> multifd one.
> 
> But if this isn't desired then, as you say, the multifd channel type can
> be indicated by using some unused field of the MultiFDInit_t message.
> 
> Of course, this would still keep the QEMU_VM_FILE_MAGIC heuristic then.

I don't like the heuristics we currently have, and would to have
a better solution. What makes me cautious is that this proposal
is a protocol change, but only addressing one very narrow problem
with the migration protocol.

I'd like migration to see a more explicit bi-directional protocol
negotiation message set, where both QEMU can auto-negotiate amongst
themselves many of the features that currently require tedious
manual configuration by mgmt apps via migrate parameters/capabilities.
That would address the problem you describe here, and so much more.

If we add this channel header feature now, it creates yet another
thing to keep around for back compatibility. So if this is not
strictly required, in order to solve the VFIO VMstate problem, I'd
prefer to just solve the VMstate stuff on its own.

> > That said, the idea of reserving channels specifically for VFIO doesn't
> > make a whole lot of sense to me either.
> > 
> > Once we've done the RAM transfer, and are in the switchover phase
> > doing device state transfer, all the multifd channels are idle.
> > We should just use all those channels to transfer the device state,
> > in parallel.  Reserving channels just guarantees many idle channels
> > during RAM transfer, and further idle channels during vmstate
> > transfer.
> > 
> > IMHO it is more flexible to just use all available multifd channel
> > resources all the time.
> 
> The reason for having dedicated device state channels is that they
> provide lower downtime in my tests.
> 
> With either 15 or 11 mixed multifd channels (no dedicated device state
> channels) I get a downtime of about 1250 msec.
> 
> Comparing that with 15 total multifd channels / 4 dedicated device
> state channels that give downtime of about 1100 ms it means that using
> dedicated channels gets about 14% downtime improvement.

Hmm, can you clarify. /when/ is the VFIO vmstate transfer taking
place ? Is is transferred concurrently with the RAM ? I had thought
this series still has the RAM transfer iterations running first,
and then the VFIO VMstate at the end, simply making use of multifd
channels for parallelism of the end phase. your reply though makes
me question my interpretation though.

Let me try to illustrate channel flow in various scenarios, time
flowing left to right:

1. serialized RAM, then serialized VM state  (ie historical migration)

      main: | Init | RAM iter 1 | RAM iter 2 | ... | RAM iter N | VM State |


2. parallel RAM, then serialized VM state (ie today's multifd)

      main: | Init |                                            | VM state |
  multifd1:        | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
  multifd2:        | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
  multifd3:        | RAM iter 1 | RAM iter 2 | ... | RAM iter N |


3. parallel RAM, then parallel VM state

      main: | Init |                                            | VM state |
  multifd1:        | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
  multifd2:        | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
  multifd3:        | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
  multifd4:                                                     | VFIO VM state 
|
  multifd5:                                                     | VFIO VM state 
|


4. parallel RAM and VFIO VM state, then remaining VM state

      main: | Init |                                            | VM state |
  multifd1:        | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
  multifd2:        | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
  multifd3:        | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
  multifd4:        | VFIO VM state                                         |
  multifd5:        | VFIO VM state                                         |


I thought this series was implementing approx (3), but are you actually
implementing (4), or something else entirely ?


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]