qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/1] migration: Allow user to specify available switchover


From: Joao Martins
Subject: Re: [PATCH v3 1/1] migration: Allow user to specify available switchover bandwidth
Date: Sat, 9 Sep 2023 15:41:20 +0100

On 05/09/2023 20:38, Peter Xu wrote:
> Migration bandwidth is a very important value to live migration.  It's
> because it's one of the major factors that we'll make decision on when to
> switchover to destination in a precopy process.
> 
> This value is currently estimated by QEMU during the whole live migration
> process by monitoring how fast we were sending the data.  This can be the
> most accurate bandwidth if in the ideal world, where we're always feeding
> unlimited data to the migration channel, and then it'll be limited to the
> bandwidth that is available.
> 
> However in reality it may be very different, e.g., over a 10Gbps network we
> can see query-migrate showing migration bandwidth of only a few tens of
> MB/s just because there are plenty of other things the migration thread
> might be doing.  For example, the migration thread can be busy scanning
> zero pages, or it can be fetching dirty bitmap from other external dirty
> sources (like vhost or KVM).  It means we may not be pushing data as much
> as possible to migration channel, so the bandwidth estimated from "how many
> data we sent in the channel" can be dramatically inaccurate sometimes.
> 
> With that, the decision to switchover will be affected, by assuming that we
> may not be able to switchover at all with such a low bandwidth, but in
> reality we can.
> 
> The migration may not even converge at all with the downtime specified,
> with that wrong estimation of bandwidth, keeping iterations forever with a
> low estimation of bandwidth.
> 
> The issue is QEMU itself may not be able to avoid those uncertainties on
> measuing the real "available migration bandwidth".  At least not something
> I can think of so far.
> 
> One way to fix this is when the user is fully aware of the available
> bandwidth, then we can allow the user to help providing an accurate value.
> 
> For example, if the user has a dedicated channel of 10Gbps for migration
> for this specific VM, the user can specify this bandwidth so QEMU can
> always do the calculation based on this fact, trusting the user as long as
> specified.  It may not be the exact bandwidth when switching over (in which
> case qemu will push migration data as fast as possible), but much better
> than QEMU trying to wildly guess, especially when very wrong.
> 
> A new parameter "avail-switchover-bandwidth" is introduced just for this.
> So when the user specified this parameter, instead of trusting the
> estimated value from QEMU itself (based on the QEMUFile send speed), it
> trusts the user more by using this value to decide when to switchover,
> assuming that we'll have such bandwidth available then.
> 
> Note that specifying this value will not throttle the bandwidth for
> switchover yet, so QEMU will always use the full bandwidth possible for
> sending switchover data, assuming that should always be the most important
> way to use the network at that time.
> 
> This can resolve issues like "unconvergence migration" which is caused by
> hilarious low "migration bandwidth" detected for whatever reason.
> 
> Reported-by: Zhiyi Guo <zhguo@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

FWIW,

Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

> ---
>  qapi/migration.json            | 11 +++++++++++
>  migration/migration.h          |  2 +-
>  migration/options.h            |  2 ++
>  migration/migration-hmp-cmds.c | 14 ++++++++++++++
>  migration/migration.c          | 24 +++++++++++++++++++++---
>  migration/options.c            | 25 +++++++++++++++++++++++++
>  migration/trace-events         |  2 +-
>  7 files changed, 75 insertions(+), 5 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index eeb1878c4f..49c36ec9c0 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -766,6 +766,16 @@
>  # @max-bandwidth: to set maximum speed for migration.  maximum speed
>  #     in bytes per second.  (Since 2.8)
>  #
> +# @avail-switchover-bandwidth: to set the available bandwidth that
> +#     migration can use during switchover phase.  NOTE!  This does not
> +#     limit the bandwidth during switchover, but only for calculations when
> +#     making decisions to switchover.  By default, this value is zero,
> +#     which means QEMU will estimate the bandwidth automatically.  This can
> +#     be set when the estimated value is not accurate, while the user is
> +#     able to guarantee such bandwidth is available when switching over.
> +#     When specified correctly, this can make the switchover decision much
> +#     more accurate.  (Since 8.2)
> +#
>  # @downtime-limit: set maximum tolerated downtime for migration.
>  #     maximum downtime in milliseconds (Since 2.8)
>  #
> @@ -856,6 +866,7 @@
>              '*tls-hostname': 'StrOrNull',
>              '*tls-authz': 'StrOrNull',
>              '*max-bandwidth': 'size',
> +            '*avail-switchover-bandwidth': 'size',
>              '*downtime-limit': 'uint64',
>              '*x-checkpoint-delay': { 'type': 'uint32',
>                                       'features': [ 'unstable' ] },
> diff --git a/migration/migration.h b/migration/migration.h
> index 6eea18db36..ce910c1db2 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -283,7 +283,7 @@ struct MigrationState {
>      /*
>       * The final stage happens when the remaining data is smaller than
>       * this threshold; it's calculated from the requested downtime and
> -     * measured bandwidth
> +     * measured bandwidth, or avail-switchover-bandwidth if specified.
>       */
>      int64_t threshold_size;
>  
> diff --git a/migration/options.h b/migration/options.h
> index 4591545c62..d78b437e82 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -83,6 +83,7 @@ typedef enum {
>      MIGRATION_PARAMETER_TLS_HOSTNAME,
>      MIGRATION_PARAMETER_TLS_AUTHZ,
>      MIGRATION_PARAMETER_MAX_BANDWIDTH,
> +    MIGRATION_PARAMETER_AVAIL_SWITCHOVER_BANDWIDTH,
>      MIGRATION_PARAMETER_DOWNTIME_LIMIT,
>      MIGRATION_PARAMETER_X_CHECKPOINT_DELAY,
>      MIGRATION_PARAMETER_BLOCK_INCREMENTAL,
> @@ -128,6 +129,7 @@ int migrate_decompress_threads(void);
>  uint64_t migrate_downtime_limit(void);
>  uint8_t migrate_max_cpu_throttle(void);
>  uint64_t migrate_max_bandwidth(void);
> +uint64_t migrate_avail_switchover_bandwidth(void);
>  uint64_t migrate_max_postcopy_bandwidth(void);
>  int migrate_multifd_channels(void);
>  MultiFDCompression migrate_multifd_compression(void);
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 0a35a87b7e..07a98a4636 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -327,6 +327,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
> QDict *qdict)
>          monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_MAX_BANDWIDTH),
>              params->max_bandwidth);
> +        assert(params->has_avail_switchover_bandwidth);
> +        monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n",
> +            
> MigrationParameter_str(MIGRATION_PARAMETER_AVAIL_SWITCHOVER_BANDWIDTH),
> +            params->avail_switchover_bandwidth);
>          assert(params->has_downtime_limit);
>          monitor_printf(mon, "%s: %" PRIu64 " ms\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_DOWNTIME_LIMIT),
> @@ -577,6 +581,16 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
> *qdict)
>          }
>          p->max_bandwidth = valuebw;
>          break;
> +    case MIGRATION_PARAMETER_AVAIL_SWITCHOVER_BANDWIDTH:
> +        p->has_avail_switchover_bandwidth = true;
> +        ret = qemu_strtosz_MiB(valuestr, NULL, &valuebw);
> +        if (ret < 0 || valuebw > INT64_MAX
> +            || (size_t)valuebw != valuebw) {
> +            error_setg(&err, "Invalid size %s", valuestr);
> +            break;
> +        }
> +        p->avail_switchover_bandwidth = valuebw;
> +        break;
>      case MIGRATION_PARAMETER_DOWNTIME_LIMIT:
>          p->has_downtime_limit = true;
>          visit_type_size(v, param, &p->downtime_limit, &err);
> diff --git a/migration/migration.c b/migration/migration.c
> index 5528acb65e..85be4f019b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2684,17 +2684,33 @@ static void migration_update_counters(MigrationState 
> *s,
>  {
>      uint64_t transferred, transferred_pages, time_spent;
>      uint64_t current_bytes; /* bytes transferred since the beginning */
> +    uint64_t switchover_bw;
> +    /* Expected bandwidth when switching over to destination QEMU */
> +    double expected_bw_per_ms;
>      double bandwidth;
>  
>      if (current_time < s->iteration_start_time + BUFFER_DELAY) {
>          return;
>      }
>  
> +    switchover_bw = migrate_avail_switchover_bandwidth();
>      current_bytes = migration_transferred_bytes(s->to_dst_file);
>      transferred = current_bytes - s->iteration_initial_bytes;
>      time_spent = current_time - s->iteration_start_time;
>      bandwidth = (double)transferred / time_spent;
> -    s->threshold_size = bandwidth * migrate_downtime_limit();
> +
> +    if (switchover_bw) {
> +        /*
> +         * If the user specified a switchover bandwidth, let's trust the
> +         * user so that can be more accurate than what we estimated.
> +         */
> +        expected_bw_per_ms = switchover_bw / 1000;
> +    } else {
> +        /* If the user doesn't specify bandwidth, we use the estimated */
> +        expected_bw_per_ms = bandwidth;
> +    }
> +
> +    s->threshold_size = expected_bw_per_ms * migrate_downtime_limit();
>  
>      s->mbps = (((double) transferred * 8.0) /
>                 ((double) time_spent / 1000.0)) / 1000.0 / 1000.0;
> @@ -2711,7 +2727,7 @@ static void migration_update_counters(MigrationState *s,
>      if (stat64_get(&mig_stats.dirty_pages_rate) &&
>          transferred > 10000) {
>          s->expected_downtime =
> -            stat64_get(&mig_stats.dirty_bytes_last_sync) / bandwidth;
> +            stat64_get(&mig_stats.dirty_bytes_last_sync) / 
> expected_bw_per_ms;
>      }
>  
>      migration_rate_reset(s->to_dst_file);
> @@ -2719,7 +2735,9 @@ static void migration_update_counters(MigrationState *s,
>      update_iteration_initial_status(s);
>  
>      trace_migrate_transferred(transferred, time_spent,
> -                              bandwidth, s->threshold_size);
> +                              /* Both in unit bytes/ms */
> +                              bandwidth, switchover_bw / 1000,
> +                              s->threshold_size);
>  }
>  
>  static bool migration_can_switchover(MigrationState *s)
> diff --git a/migration/options.c b/migration/options.c
> index c9b90d932d..65d0b58fa9 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -101,6 +101,7 @@ const char 
> *MigrationParameter_string[MIGRATION_PARAMETER__MAX] = {
>      [MIGRATION_PARAMETER_TLS_HOSTNAME] = "tls-hostname",
>      [MIGRATION_PARAMETER_TLS_AUTHZ] = "tls-authz",
>      [MIGRATION_PARAMETER_MAX_BANDWIDTH] = "max-bandwidth",
> +    [MIGRATION_PARAMETER_AVAIL_SWITCHOVER_BANDWIDTH] = 
> "avail-switchover-bandwidth",
>      [MIGRATION_PARAMETER_DOWNTIME_LIMIT] = "downtime-limit",
>      [MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] = "x-checkpoint-delay",
>      [MIGRATION_PARAMETER_BLOCK_INCREMENTAL] = "block-incremental",
> @@ -176,6 +177,8 @@ Property migration_properties[] = {
>                        parameters.cpu_throttle_tailslow, false),
>      DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
>                        parameters.max_bandwidth, MAX_THROTTLE),
> +    DEFINE_PROP_SIZE("avail-switchover-bandwidth", MigrationState,
> +                      parameters.avail_switchover_bandwidth, 0),
>      DEFINE_PROP_UINT64("x-downtime-limit", MigrationState,
>                        parameters.downtime_limit,
>                        DEFAULT_MIGRATE_SET_DOWNTIME),
> @@ -868,6 +871,13 @@ uint64_t migrate_max_bandwidth(void)
>      return s->parameters.max_bandwidth;
>  }
>  
> +uint64_t migrate_avail_switchover_bandwidth(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    return s->parameters.avail_switchover_bandwidth;
> +}
> +
>  uint64_t migrate_max_postcopy_bandwidth(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -1004,6 +1014,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
> **errp)
>      params->tls_authz = QAPI_CLONE(StrOrNull, s->parameters.tls_authz);
>      params->has_max_bandwidth = true;
>      params->max_bandwidth = s->parameters.max_bandwidth;
> +    params->has_avail_switchover_bandwidth = true;
> +    params->avail_switchover_bandwidth = 
> s->parameters.avail_switchover_bandwidth;
>      params->has_downtime_limit = true;
>      params->downtime_limit = s->parameters.downtime_limit;
>      params->has_x_checkpoint_delay = true;
> @@ -1144,6 +1156,15 @@ bool migrate_params_check(MigrationParameters *params, 
> Error **errp)
>          return false;
>      }
>  
> +    if (params->has_avail_switchover_bandwidth &&
> +        (params->avail_switchover_bandwidth > SIZE_MAX)) {
> +        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> +                   "avail_switchover_bandwidth",
> +                   "an integer in the range of 0 to "stringify(SIZE_MAX)
> +                   " bytes/second");
> +        return false;
> +    }
> +
>      if (params->has_downtime_limit &&
>          (params->downtime_limit > MAX_MIGRATE_DOWNTIME)) {
>          error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> @@ -1320,6 +1341,10 @@ static void migrate_params_apply(MigrationParameters 
> *params, Error **errp)
>          }
>      }
>  
> +    if (params->has_avail_switchover_bandwidth) {
> +        s->parameters.avail_switchover_bandwidth = 
> params->avail_switchover_bandwidth;
> +    }
> +
>      if (params->has_downtime_limit) {
>          s->parameters.downtime_limit = params->downtime_limit;
>      }
> diff --git a/migration/trace-events b/migration/trace-events
> index 4666f19325..ebd88de3d0 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -185,7 +185,7 @@ source_return_path_thread_shut(uint32_t val) "0x%x"
>  source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
>  source_return_path_thread_switchover_acked(void) ""
>  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> -migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t 
> bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " 
> bandwidth %" PRIu64 " max_size %" PRId64
> +migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t 
> bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " 
> time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " 
> max_size %" PRId64
>  process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  postcopy_preempt_enabled(bool value) "%d"



reply via email to

[Prev in Thread] Current Thread [Next in Thread]