qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delv


From: Daniel P . Berrangé
Subject: Re: [PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP
Date: Tue, 1 Sep 2020 14:41:47 +0100
User-agent: Mutt/1.14.6 (2020-07-11)

On Tue, Sep 01, 2020 at 03:22:24PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Thu, Aug 27, 2020 at 01:04:43PM +0200, Markus Armbruster wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> 
> >> > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote:
> >> > From the POV of practicality, making a design that unifies internal
> >> > and external snapshots is something I'm considering out of scope.
> >> > It increases the design time burden, as well as implementation burden.
> >> > On my side, improving internal snapshots is a "spare time" project,
> >> > not something I can justify spending weeks or months on.
> >> 
> >> I'm not demanding a solution that unifies internal and external
> >> snapshots.  I'm asking for a bit of serious thought on an interface that
> >> could compatibly evolve there.  Hours, not weeks or months.
> >> 
> >> > My goal is to implement something that is achievable in a short
> >> > amount of time that gets us out of the hole we've been in for 10
> >> > years. Minimal refactoring of the internal snapshot code aside
> >> > from fixing the critical limitations we have today around choice
> >> > of disks to snapshot.
> >> >
> >> > If someone later wants to come up with a grand unified design
> >> > for everything that's fine, we can deprecate the new QMP commands
> >> > I'm proposing now.
> >> 
> >> Failing at coming up with an interface that has a reasonable chance to
> >> be future-proof is okay.
> >> 
> >> Not even trying is not okay.
> >
> > This was raised in my v1 posting:
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01346.html
> >
> > but the conclusion was that it was a non-trivial amount of extra
> > implementation work
> 
> Thanks for the pointer.  I've now read that review thread.
> 
> >> Specifically, I'd like you to think about monolothic snapshot command
> >> (internal snapshots only by design) vs. transaction of individual
> >> snapshot commands (design is not restricted to internal snapshots, but
> >> we may want to accept implementation restrictions).
> >> 
> >> We already have transactionable individual storage snapshot commands.
> >> What's missing is a transactionable machine state snapshot command.
> >
> > At a high level I consider what I've proposed as being higher level
> > syntax sugar vs a more generic future impl based on multiple commands
> > for snapshotting disk & state. I don't think I'd claim that it will
> > evolve to become the design you're suggesting here, as they are designs
> > from different levels in the stack. IOW, I think the would ultimately
> > just exist in parallel. I don't think that's a real problem from a
> > maint POV, as the large burden from the monolithic snapshot code is
> > not the HMP/QMP interface, but rather the guts of the internal
> > impl in the migration/savevm.c and block/snapshot.c files. That code
> > will exist for as long as the HMP commands exist, and adding a QMP
> > interface doesn't make that situation worse unless we were intending
> > to drop the existing HMP commands. If we did change our minds though,
> > we can just deprecate the QMP command at any time we like.
> >
> >
> >> One restriction I'd readily accept at this time is "the machine state
> >> snapshot must write to a QCOW2 that is also internally snapshot in the
> >> same transaction".
> >> 
> >> Now explain to me why this is impractical.
> >
> > The issues were described by Kevin here:
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg02057.html
> >
> > Assuming the migration impl is actually possible to solve, there is
> > still the question of actually writing it. That's a non-trivial
> > amount of work someone has to find time for.
> 
> Kevin explained how the transactionable machine state snapshot command
> should be made non-blocking: post-copy.
> 
> I don't dispute that creating such a post-copy snapshot is a non-trivial
> task.  It is out of reach for you and me.  I didn't actually ask for it,
> though.
> 
> You argue that providing a blocking snapshot in QMP is better than
> nothing, and good enough for quite a few applications.  I agree!  I
> blocked prior attempts at porting HMP's savevm/loadvm to QMP not because
> they were blocking, but because they stuck to the HMP interface, and the
> HMP interface is bonkers.  I would accept the restriction "snapshotting
> machine state is blocking, i.e. it stops the machine."  As I wrote in
> 2016, "Limitations: No live internal machine snapshot, yet."

FYI, when I documented the new QAPI commands I implemented, i choose to
*not* say that the snapshot is blocking. Instead I said:

  # Applications should not assume that the snapshot load is complete
  # when this command returns. Completion is indicated by the job
  # status. Clients can wait for the JOB_STATUS_CHANGE event. If the
  # job aborts, errors can be obtained via the 'query-jobs' command,
  # though.

The idea was that if we fix these QAPI commands to not block in future,
we want to minimize the risk of breaking clients, by discouraging them
from assuming the impl will always be blocking in future. IOW they
should assume the commands are asynchronous right now, even though they
are not.

> Aside: unless I'm mistaken, taking an internal block device snapshot is
> also blocking, but unlike taking a machine state snapshot, it's fast
> enough for the blocking not to matter.  That's the "sync" in
> blockdev-snapshot-internal-sync.


> Let me elaborate a bit on the desugaring, just to make sure we're on the
> same page.  Please correct me where I'm talking nonsense.
> 
> snapshot-save creates job that snapshots a set of block devices and the
> machine state.  The snapshots are consistent, i.e. they are all taken at
> the same point in time.  The block device snapshots are all internal.
> The machine state snapshot is saved together with one of the (internal)
> block device snapshots.

Yep

> This is basically a transaction of blockdev-snapshot-internal-sync
> (which exists) plus machine-snapshot-internal-sync (which doesn't exist)
> wrapped in a job.

Yes, except it isn't clear to me whether you can separate out the
functionality into two separate commands. It might be neccessary
to save the vmstate at the same time as the disk snapshot. In such
a case, instead of machine-snapshot-internal-sync, we might end up
having a "include vmstate" bool option to blockdev-snapshot-internal-sync
Either way though, we'd end up with a series of commands inside a
transaction.

> Likweise or snapshot-load, except there not even the command for block
> snapshots exists.
> 
> I doubt creating the (transactionable, but blocking) low-level commands
> is actually out of reach.  It's a matter of adding interfaces to parts
> of the code you already got working.

If we're splitting it up into one command per disk, and another command
for vmstate, then it will require refactoring the current migration/savevm.c
and block/snapshot.c code AFAICT, because its all written around the idea
of processing all disks at the same time.

> I'm not demanding you do that, though.  As I said, my chief concerns are
> keeping "bonkers" out of QMP, and not boxing us in needlessly.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]