qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] migration: switchover-hold parameter


From: Peter Xu
Subject: Re: [PATCH 1/2] migration: switchover-hold parameter
Date: Mon, 5 Jun 2023 12:06:44 -0400

On Mon, Jun 05, 2023 at 03:27:53PM +0300, Avihai Horon wrote:
> Hi Peter,
> 
> On 02/06/2023 17:47, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Add a new migration parameter switchover-hold which can block src qemu
> > migration from switching over to dest from running.
> > 
> > One can set this flag to true so src qemu will keep iterating the VM data,
> > not switching over to dest even if it can.
> > 
> > It means now live migration works somehow like COLO; we keep syncing data
> > from src to dst without stopping.
> > 
> > When the user is ready for the switchover, one can set the parameter from
> > true->false.  That'll contain a implicit kick to migration thread to be
> > alive and re-evaluate the switchover decision.
> > 
> > This can be used in two cases so far in my mind:
> > 
> >    (1) One can use this parameter to start pre-heating migration (but not
> >        really migrating, so a migrate-cancel will cancel the preheat).  When
> >        the user wants to really migrate, just clear the flag.  It'll in most
> >        cases migrate immediately because most pages are already synced.
> > 
> >    (2) Can also be used as a clean way to do qtest, in many of the precopy
> >        tests we have requirement to run after 1 iteration without completing
> >        the precopy migration.  Before that we have either set bandwidth to
> >        ridiculous low value, or tricks on detecting guest memory change over
> >        some adhoc guest memory position.  Now we can simply set this flag
> >        then we know precopy won't complete and will just keep going.
> > 
> > Here we leveraged a sem to make sure migration thread won't busy spin on a
> > physical cpu, meanwhile provide a timedwait() of 10ms so it can still try
> > its best to sync with dest QEMU from time to time.  Note that the sem is
> > prone to outdated counts but it's benign, please refer to the comment above
> > the semaphore definition for more information.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   qapi/migration.json            | 25 +++++++++++++--
> >   migration/migration.h          | 16 ++++++++++
> >   migration/migration-hmp-cmds.c |  3 ++
> >   migration/migration.c          | 56 ++++++++++++++++++++++++++++++++--
> >   migration/options.c            | 17 +++++++++++
> >   5 files changed, 111 insertions(+), 6 deletions(-)
> > 
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 179af0c4d8..1d0059d125 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -779,6 +779,15 @@
> >   #     Nodes are mapped to their block device name if there is one, and
> >   #     to their node name otherwise.  (Since 5.2)
> >   #
> > +# @switchover-hold: Whether we should hold-off precopy switchover from src
> > +#     to dest QEMU, even if we can finish migration in the downtime
> > +#     specified.  By default off, so precopy migration will complete as
> > +#     soon as possible.  One can set it to explicitly keep iterating during
> > +#     precopy migration until set the flag to false again to kick off the
> > +#     final switchover.  Note, this does not affect postcopy switchover,
> > +#     because the user can control that using "migrate-start-postcopy"
> > +#     command explicitly. (Since 8.1)
> > +#
> 
> I think this should wrap at col 70, and need double space before (Since
> 8.1).
> Also in other places below.

I don't know that rule; anywhere documented?  It seems true for qapi/.

Obviously I forgot to copy Markus and Eric at least, since this is qmp stuff.

> 
> >   # Features:
> >   #
> >   # @unstable: Member @x-checkpoint-delay is experimental.
> > @@ -800,7 +809,7 @@
> >              'xbzrle-cache-size', 'max-postcopy-bandwidth',
> >              'max-cpu-throttle', 'multifd-compression',
> >              'multifd-zlib-level' ,'multifd-zstd-level',
> > -           'block-bitmap-mapping' ] }
> > +           'block-bitmap-mapping', 'switchover-hold' ] }
> > 
> >   ##
> >   # @MigrateSetParameters:
> > @@ -935,6 +944,10 @@
> >   #     Nodes are mapped to their block device name if there is one, and
> >   #     to their node name otherwise.  (Since 5.2)
> >   #
> > +# @switchover-hold: Whether we should hold-off precopy switchover from src
> > +#     to dest QEMU.  For more details, please refer to MigrationParameter
> > +#     entry of the same field. (Since 8.1)
> > +#
> >   # Features:
> >   #
> >   # @unstable: Member @x-checkpoint-delay is experimental.
> > @@ -972,7 +985,8 @@
> >               '*multifd-compression': 'MultiFDCompression',
> >               '*multifd-zlib-level': 'uint8',
> >               '*multifd-zstd-level': 'uint8',
> > -            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> > +            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> > +            '*switchover-hold': 'bool' } }
> > 
> >   ##
> >   # @migrate-set-parameters:
> > @@ -1127,6 +1141,10 @@
> >   #     Nodes are mapped to their block device name if there is one, and
> >   #     to their node name otherwise.  (Since 5.2)
> >   #
> > +# @switchover-hold: Whether we should hold-off precopy switchover from src
> > +#     to dest QEMU.  For more details, please refer to MigrationParameter
> > +#     entry of the same field. (Since 8.1)
> > +#
> >   # Features:
> >   #
> >   # @unstable: Member @x-checkpoint-delay is experimental.
> > @@ -1161,7 +1179,8 @@
> >               '*multifd-compression': 'MultiFDCompression',
> >               '*multifd-zlib-level': 'uint8',
> >               '*multifd-zstd-level': 'uint8',
> > -            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> > +            '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> > +            '*switchover-hold': 'bool' } }
> > 
> >   ##
> >   # @query-migrate-parameters:
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 30c3e97635..721b1c9473 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -440,6 +440,22 @@ struct MigrationState {
> > 
> >       /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. 
> > */
> >       JSONWriter *vmdesc;
> > +    /*
> > +     * Only migration thread will wait on it when switchover_hold==true.
> > +     *
> > +     * Only qmp set param will kick it when switching switchover_hold from
> > +     * true->false.
> > +     *
> > +     * NOTE: outdated sem count here is benign.  E.g., when this is posted,
> > +     * the 1st migration got cancelled, then start the 2nd migration, or
> > +     * when someone sets the flag from true->false->true->false.. because
> > +     * any outdated sem count will only let the migration thread to run one
> > +     * more loop (timedwait() will eat the outdated count) when reaching
> > +     * the completion phase, then in the next loop it'll sleep again.  The
> > +     * important thing here OTOH is when the migration thread is sleeping
> > +     * we can always kick it out of the sleep, which we will always do.
> > +     */
> > +    QemuSemaphore switchover_hold_sem;
> >   };
> > 
> >   void migrate_set_state(int *state, int old_state, int new_state);
> > diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> > index 9885d7c9f7..63a2c8a4a3 100644
> > --- a/migration/migration-hmp-cmds.c
> > +++ b/migration/migration-hmp-cmds.c
> > @@ -338,6 +338,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
> > QDict *qdict)
> >           monitor_printf(mon, "%s: '%s'\n",
> >               MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ),
> >               params->tls_authz);
> > +        monitor_printf(mon, "%s: %s\n",
> > +            MigrationParameter_str(MIGRATION_PARAMETER_SWITCHOVER_HOLD),
> > +            params->switchover_hold ? "on" : "off");
> > 
> >           if (params->has_block_bitmap_mapping) {
> >               const BitmapMigrationNodeAliasList *bmnal;
> > diff --git a/migration/migration.c b/migration/migration.c
> > index dc05c6f6ea..076c9f1372 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2693,6 +2693,55 @@ static void migration_update_counters(MigrationState 
> > *s,
> >                                 bandwidth, s->threshold_size);
> >   }
> > 
> > +static bool
> > +migration_should_complete(MigrationState *s, uint64_t pending_size)
> > +{
> > +    /* We still have large pending data to send? */
> > +    if (pending_size && (pending_size >= s->threshold_size)) {
> 
> The parenthesis here are redundant.
> 
> > +        return false;
> > +    }
> > +
> > +    /* The user doesn't want us to switchover yet */
> > +    if (s->parameters.switchover_hold) {
> 
> Should we check this only if we are in precopy? I.e., skip this check if
> postcopy is active?
> 
> I thought that this parameter is all about precopy switchover, and the fact
> that it prevents stopcopy from reaching migration_completion() is just an
> undesired side effect.
> As I see it, if this parameter is set then precopy switchover is hold off,
> and if on top of that a user starts postcopy then this parameter is not
> relevant anymore - we should start postcopy and ignore it for the rest of
> migration.
> 
> Does it make sense?

Yes, it matches more with the doc update indeed.

I didn't bother with that and added flag=false in the postcopy tests in the
follow up patch because logically a postcopy user shouldn't even use it
(also per the doc), but I can also make it strict here as you said.

Thanks for taking a look!

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]