qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] failover: allow to pause the VM during the migration


From: Daniel P . Berrangé
Subject: Re: [PATCH] failover: allow to pause the VM during the migration
Date: Fri, 1 Oct 2021 10:01:43 +0100
User-agent: Mutt/2.0.7 (2021-05-04)

On Thu, Sep 30, 2021 at 04:17:44PM -0400, Laine Stump wrote:
> On 9/30/21 1:09 PM, Laurent Vivier wrote:
> > If we want to save a snapshot of a VM to a file, we used to follow the
> > following steps:
> > 
> > 1- stop the VM:
> >     (qemu) stop
> > 
> > 2- migrate the VM to a file:
> >     (qemu) migrate "exec:cat > snapshot"
> > 
> > 3- resume the VM:
> >     (qemu) cont
> > 
> > After that we can restore the snapshot with:
> >    qemu-system-x86_64 ... -incoming "exec:cat snapshot"
> >    (qemu) cont
> 
> This is the basics of what libvirt does for a snapshot, and steps 1+2 are
> what it does for a "managedsave" (where it saves the snapshot to disk and
> then terminates the qemu process, for later re-animation).
> 
> In those cases, it seems like this new parameter could work for us - instead
> of explicitly pausing the guest prior to migrating it to disk, we would set
> this new parameter to on, then directly migrate-to-disk (relying on qemu to
> do the pause). Care will need to be taken to assure that error recovery
> behaves the same though.

What libvirt does is actually quite different from this in a signficant
way.  In the HMP example here 'migrate' is a blocking command that does
not return until migration is finished.

Libvirt uses QMP and 'migrate' there is a asynchronous command that merely
launches the migration and returns control to the client.

IOW, what libvirt does is

    stop
    migrate
    while status != failed || completed
       query-migrate
       
       ...also receive any QMP migration events...

       ...possibly modify migration parameters...

    cont

With this pattern I'm not seeing any need for a new migration parameter
for libvirt. The migration status lets us distinguish when QEMU is in
the "waiting for unplug" phase vs the "active" phase. So AFAICT, libvirt
can do:

    migrate
    while status != failed || completed
       query-migrate
       
       ...also receive any QMP migration events..

       if status changed wait-for-unplug to active
         stop

       ...possibly modify migration parameters...

    cont


There is a small window here when the guest CPUs are running
but migration is active.  In most cases for libvirt that is
harmless.  If there are cases where libvirt needs a strong
guarantee to synchonize the 'stop' with some other option,
then the new proposed "pause-vm" parameter as the same problem
as libvirt can't sychronize against that either.


> There are a couple of cases when libvirt apparently *doesn't* pause the
> guest during the migrate-to-disk, both having to do with saving a coredump
> of the guest. Since I really have no idea of how common/important that is
> (or even if my assessment of the code is correct), I'm Cc'ing this patch to
> libvir-list to make sure it catches the attention of someone who knows the
> answers and implications.

IIUC, the problem with unplug only happens when libvirt pauses
the guest. So surely if there are some scenarios where we're not
pausing the guest, there's no problem to solve for those.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]