[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] migration: switchover-hold parameter
From: |
Peter Xu |
Subject: |
Re: [PATCH 1/2] migration: switchover-hold parameter |
Date: |
Mon, 5 Jun 2023 12:06:44 -0400 |
On Mon, Jun 05, 2023 at 03:27:53PM +0300, Avihai Horon wrote:
> Hi Peter,
>
> On 02/06/2023 17:47, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > Add a new migration parameter switchover-hold which can block src qemu
> > migration from switching over to dest from running.
> >
> > One can set this flag to true so src qemu will keep iterating the VM data,
> > not switching over to dest even if it can.
> >
> > It means now live migration works somehow like COLO; we keep syncing data
> > from src to dst without stopping.
> >
> > When the user is ready for the switchover, one can set the parameter from
> > true->false. That'll contain a implicit kick to migration thread to be
> > alive and re-evaluate the switchover decision.
> >
> > This can be used in two cases so far in my mind:
> >
> > (1) One can use this parameter to start pre-heating migration (but not
> > really migrating, so a migrate-cancel will cancel the preheat). When
> > the user wants to really migrate, just clear the flag. It'll in most
> > cases migrate immediately because most pages are already synced.
> >
> > (2) Can also be used as a clean way to do qtest, in many of the precopy
> > tests we have requirement to run after 1 iteration without completing
> > the precopy migration. Before that we have either set bandwidth to
> > ridiculous low value, or tricks on detecting guest memory change over
> > some adhoc guest memory position. Now we can simply set this flag
> > then we know precopy won't complete and will just keep going.
> >
> > Here we leveraged a sem to make sure migration thread won't busy spin on a
> > physical cpu, meanwhile provide a timedwait() of 10ms so it can still try
> > its best to sync with dest QEMU from time to time. Note that the sem is
> > prone to outdated counts but it's benign, please refer to the comment above
> > the semaphore definition for more information.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > qapi/migration.json | 25 +++++++++++++--
> > migration/migration.h | 16 ++++++++++
> > migration/migration-hmp-cmds.c | 3 ++
> > migration/migration.c | 56 ++++++++++++++++++++++++++++++++--
> > migration/options.c | 17 +++++++++++
> > 5 files changed, 111 insertions(+), 6 deletions(-)
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 179af0c4d8..1d0059d125 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -779,6 +779,15 @@
> > # Nodes are mapped to their block device name if there is one, and
> > # to their node name otherwise. (Since 5.2)
> > #
> > +# @switchover-hold: Whether we should hold-off precopy switchover from src
> > +# to dest QEMU, even if we can finish migration in the downtime
> > +# specified. By default off, so precopy migration will complete as
> > +# soon as possible. One can set it to explicitly keep iterating during
> > +# precopy migration until set the flag to false again to kick off the
> > +# final switchover. Note, this does not affect postcopy switchover,
> > +# because the user can control that using "migrate-start-postcopy"
> > +# command explicitly. (Since 8.1)
> > +#
>
> I think this should wrap at col 70, and need double space before (Since
> 8.1).
> Also in other places below.
I don't know that rule; anywhere documented? It seems true for qapi/.
Obviously I forgot to copy Markus and Eric at least, since this is qmp stuff.
>
> > # Features:
> > #
> > # @unstable: Member @x-checkpoint-delay is experimental.
> > @@ -800,7 +809,7 @@
> > 'xbzrle-cache-size', 'max-postcopy-bandwidth',
> > 'max-cpu-throttle', 'multifd-compression',
> > 'multifd-zlib-level' ,'multifd-zstd-level',
> > - 'block-bitmap-mapping' ] }
> > + 'block-bitmap-mapping', 'switchover-hold' ] }
> >
> > ##
> > # @MigrateSetParameters:
> > @@ -935,6 +944,10 @@
> > # Nodes are mapped to their block device name if there is one, and
> > # to their node name otherwise. (Since 5.2)
> > #
> > +# @switchover-hold: Whether we should hold-off precopy switchover from src
> > +# to dest QEMU. For more details, please refer to MigrationParameter
> > +# entry of the same field. (Since 8.1)
> > +#
> > # Features:
> > #
> > # @unstable: Member @x-checkpoint-delay is experimental.
> > @@ -972,7 +985,8 @@
> > '*multifd-compression': 'MultiFDCompression',
> > '*multifd-zlib-level': 'uint8',
> > '*multifd-zstd-level': 'uint8',
> > - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> > + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> > + '*switchover-hold': 'bool' } }
> >
> > ##
> > # @migrate-set-parameters:
> > @@ -1127,6 +1141,10 @@
> > # Nodes are mapped to their block device name if there is one, and
> > # to their node name otherwise. (Since 5.2)
> > #
> > +# @switchover-hold: Whether we should hold-off precopy switchover from src
> > +# to dest QEMU. For more details, please refer to MigrationParameter
> > +# entry of the same field. (Since 8.1)
> > +#
> > # Features:
> > #
> > # @unstable: Member @x-checkpoint-delay is experimental.
> > @@ -1161,7 +1179,8 @@
> > '*multifd-compression': 'MultiFDCompression',
> > '*multifd-zlib-level': 'uint8',
> > '*multifd-zstd-level': 'uint8',
> > - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
> > + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ],
> > + '*switchover-hold': 'bool' } }
> >
> > ##
> > # @query-migrate-parameters:
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 30c3e97635..721b1c9473 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -440,6 +440,22 @@ struct MigrationState {
> >
> > /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices.
> > */
> > JSONWriter *vmdesc;
> > + /*
> > + * Only migration thread will wait on it when switchover_hold==true.
> > + *
> > + * Only qmp set param will kick it when switching switchover_hold from
> > + * true->false.
> > + *
> > + * NOTE: outdated sem count here is benign. E.g., when this is posted,
> > + * the 1st migration got cancelled, then start the 2nd migration, or
> > + * when someone sets the flag from true->false->true->false.. because
> > + * any outdated sem count will only let the migration thread to run one
> > + * more loop (timedwait() will eat the outdated count) when reaching
> > + * the completion phase, then in the next loop it'll sleep again. The
> > + * important thing here OTOH is when the migration thread is sleeping
> > + * we can always kick it out of the sleep, which we will always do.
> > + */
> > + QemuSemaphore switchover_hold_sem;
> > };
> >
> > void migrate_set_state(int *state, int old_state, int new_state);
> > diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> > index 9885d7c9f7..63a2c8a4a3 100644
> > --- a/migration/migration-hmp-cmds.c
> > +++ b/migration/migration-hmp-cmds.c
> > @@ -338,6 +338,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const
> > QDict *qdict)
> > monitor_printf(mon, "%s: '%s'\n",
> > MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ),
> > params->tls_authz);
> > + monitor_printf(mon, "%s: %s\n",
> > + MigrationParameter_str(MIGRATION_PARAMETER_SWITCHOVER_HOLD),
> > + params->switchover_hold ? "on" : "off");
> >
> > if (params->has_block_bitmap_mapping) {
> > const BitmapMigrationNodeAliasList *bmnal;
> > diff --git a/migration/migration.c b/migration/migration.c
> > index dc05c6f6ea..076c9f1372 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2693,6 +2693,55 @@ static void migration_update_counters(MigrationState
> > *s,
> > bandwidth, s->threshold_size);
> > }
> >
> > +static bool
> > +migration_should_complete(MigrationState *s, uint64_t pending_size)
> > +{
> > + /* We still have large pending data to send? */
> > + if (pending_size && (pending_size >= s->threshold_size)) {
>
> The parenthesis here are redundant.
>
> > + return false;
> > + }
> > +
> > + /* The user doesn't want us to switchover yet */
> > + if (s->parameters.switchover_hold) {
>
> Should we check this only if we are in precopy? I.e., skip this check if
> postcopy is active?
>
> I thought that this parameter is all about precopy switchover, and the fact
> that it prevents stopcopy from reaching migration_completion() is just an
> undesired side effect.
> As I see it, if this parameter is set then precopy switchover is hold off,
> and if on top of that a user starts postcopy then this parameter is not
> relevant anymore - we should start postcopy and ignore it for the rest of
> migration.
>
> Does it make sense?
Yes, it matches more with the doc update indeed.
I didn't bother with that and added flag=false in the postcopy tests in the
follow up patch because logically a postcopy user shouldn't even use it
(also per the doc), but I can also make it strict here as you said.
Thanks for taking a look!
--
Peter Xu