qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/9] vfio/migration: Skip pre-copy if dirty page tracking is


From: Jason Gunthorpe
Subject: Re: [PATCH 4/9] vfio/migration: Skip pre-copy if dirty page tracking is not supported
Date: Mon, 16 May 2022 20:08:32 -0300

On Mon, May 16, 2022 at 02:22:00PM -0600, Alex Williamson wrote:
> On Mon, 16 May 2022 13:22:14 +0200
> Juan Quintela <quintela@redhat.com> wrote:
> 
> > Avihai Horon <avihaih@nvidia.com> wrote:
> > > Currently, if IOMMU of a VFIO container doesn't support dirty page
> > > tracking, migration is blocked completely. This is because a DMA-able
> > > VFIO device can dirty RAM pages without updating QEMU about it, thus
> > > breaking the migration.
> > >
> > > However, this doesn't mean that migration can't be done at all. If
> > > migration pre-copy phase is skipped, the VFIO device doesn't have a
> > > chance to dirty RAM pages that have been migrated already, thus
> > > eliminating the problem previously mentioned.
> > >
> > > Hence, in such case allow migration but skip pre-copy phase.
> > >
> > > Signed-off-by: Avihai Horon <avihaih@nvidia.com>  
> > 
> > I don't know (TM).
> > Several issues:
> > - Patch is ugly as hell (ok, that depends on taste)
> > - It changes migration_iteration_run() instead of directly
> >   migration_thread.
> > - There is already another case where we skip the sending of RAM
> >   (localhost migration with shared memory)
> > 
> > In migration/ram.c:
> > 
> > static int ram_find_and_save_block(RAMState *rs, bool last_stage)
> > {
> >     PageSearchStatus pss;
> >     int pages = 0;
> >     bool again, found;
> > 
> >     /* No dirty page as there is zero RAM */
> >     if (!ram_bytes_total()) {
> >         return pages;
> >     }
> > 
> > This is the other place where we _don't_ send any RAM at all.
> > 
> > I don't have a great idea about how to make things clear at a higher
> > level, I have to think about this.
> 
> It seems like if we have devices dictating what type of migrations can
> be performed then there probably needs to be a switch to restrict use of
> such devices just as we have the -only-migratable switch now to prevent
> attaching devices that don't support migration.  I'd guess that we need
> the switch to opt-in to allowing such devices to maintain
> compatibility.  There's probably a whole pile of qapi things missing to
> expose this to management tools as well.  Thanks,

This is really intended to be a NOP from where things are now, as if
you use mlx5 live migration without a patch like this then it causes a
botched pre-copy since everything just ends up permanently dirty.

If it makes more sense we could abort the pre-copy too - in the end
there will be dirty tracking so I don't know if I'd invest in a big
adventure to fully define non-dirty tracking migration.

Jason



reply via email to

[Prev in Thread] Current Thread [Next in Thread]