qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH QEMU v5 6/8] migration: Implement dirty-limit convergence alg


From: Juan Quintela
Subject: Re: [PATCH QEMU v5 6/8] migration: Implement dirty-limit convergence algo
Date: Tue, 13 Jun 2023 19:50:24 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

~hyman <hyman@git.sr.ht> wrote:
> From: Hyman Huang(黄勇) <yong.huang@smartx.com>

To speed thinkng up, 1-5 are included on next Migration PULL request.

> Implement dirty-limit convergence algo for live migration,
> which is kind of like auto-converge algo but using dirty-limit
> instead of cpu throttle to make migration convergent.
>
> Enable dirty page limit if dirty_rate_high_cnt greater than 2
> when dirty-limit capability enabled, Disable dirty-limit if
> migration be cancled.

Nit: canceled.

>
> Note that "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit"
> commands are not allowed during dirty-limit live migration.
>
> Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>


> + * Enable dirty-limit to throttle down the guest
> + */
> +static void migration_dirty_limit_guest(void)
> +{
> +    static int64_t quota_dirtyrate;

quota_dirtyrate deserves at least a comment.

I guess it means the current quota_dirty_rate that is set, but no clue.

> +    MigrationState *s = migrate_get_current();
> +
> +    /*
> +     * If dirty limit already enabled and migration parameter
> +     * vcpu-dirty-limit untouched.
> +     */
> +    if (dirtylimit_in_service() &&
> +        quota_dirtyrate == s->parameters.vcpu_dirty_limit) {
> +        return;
> +    }
> +
> +    quota_dirtyrate = s->parameters.vcpu_dirty_limit;
> +
> +    /* Set or update quota dirty limit */
> +    qmp_set_vcpu_dirty_limit(false, -1, quota_dirtyrate, NULL);

Care to explain why do we have to "reset" the quota?  Or why we can't
set it when the user issues the command, only when we throttle the guest?

> +    trace_migration_dirty_limit_guest(quota_dirtyrate);
> +}
> +

Split this patch in two:

a - the logic change
b - the introduction of dirty limit.


Old code:

    /* During block migration the auto-converge logic incorrectly detects
     * that ram migration makes no progress. Avoid this by disabling the
     * throttling logic during the bulk phase of block migration. */
    if (blk_mig_bulk_active()) {
        return;
    }

    if (migrate_auto_converge()) {
        /* The following detection logic can be refined later. For now:
           Check to see if the ratio between dirtied bytes and the approx.
           amount of bytes that just got transferred since the last time
           we were in this routine reaches the threshold. If that happens
           twice, start or increase throttling. */

        if ((bytes_dirty_period > bytes_dirty_threshold) &&
            (++rs->dirty_rate_high_cnt >= 2)) {
            trace_migration_throttle();
            rs->dirty_rate_high_cnt = 0;
            mig_throttle_guest_down(bytes_dirty_period,
                                    bytes_dirty_threshold);
        }
    }

New code:
    /*
     * The following detection logic can be refined later. For now:
     * Check to see if the ratio between dirtied bytes and the approx.
     * amount of bytes that just got transferred since the last time
     * we were in this routine reaches the threshold. If that happens
     * twice, start or increase throttling.
     */

    if ((bytes_dirty_period > bytes_dirty_threshold) &&
        (++rs->dirty_rate_high_cnt >= 2)) {
        rs->dirty_rate_high_cnt = 0;
        /*
         * During block migration the auto-converge logic incorrectly detects
         * that ram migration makes no progress. Avoid this by disabling the
         * throttling logic during the bulk phase of block migration
         */
        if (blk_mig_bulk_active()) {
            return;
        }

        if (migrate_auto_converge()) {
            trace_migration_throttle();
            mig_throttle_guest_down(bytes_dirty_period,
                                    bytes_dirty_threshold);
        } else if (migrate_dirty_limit()) {
            migration_dirty_limit_guest();
        }
    }

Questions:

- Why are we changing blk_mig_bulk_active() position?

  I think that the old code have it in the right place.  Additionally,
  you just changefd to this version a couple of patches agon.




>                                   int64_t cpu_index,
>                                   Error **errp)
>  {
> +    MigrationState *ms = migrate_get_current();
> +
>      if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
>          return;
>      }
> @@ -453,6 +455,15 @@ void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
>          return;
>      }
>  
> +    if (migration_is_running(ms->state) &&
> +        (!qemu_thread_is_self(&ms->thread)) &&
> +        migrate_dirty_limit() &&
> +        dirtylimit_in_service()) {
> +        error_setg(errp, "can't cancel dirty page limit while"
> +                   " migration is running");

Error message is bad or wrong.
You can cancel the dirty page, you ust need to be on the main thread.

Or I am missing something?



> +        return;
> +    }
> +
>      dirtylimit_state_lock();
>  
>      if (has_cpu_index) {
> @@ -488,6 +499,8 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
>                                uint64_t dirty_rate,
>                                Error **errp)
>  {
> +    MigrationState *ms = migrate_get_current();
> +
>      if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
>          error_setg(errp, "dirty page limit feature requires KVM with"
>                     " accelerator property 'dirty-ring-size' set'")
> @@ -504,6 +517,15 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
>          return;
>      }
>  
> +    if (migration_is_running(ms->state) &&
> +        (!qemu_thread_is_self(&ms->thread)) &&
> +        migrate_dirty_limit() &&
> +        dirtylimit_in_service()) {
> +        error_setg(errp, "can't cancel dirty page limit while"
> +                   " migration is running");
> +        return;
> +    }

If you use such a complex expression twice, I think that creating a
helper function is a good idea.

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]