Re: [PATCH QEMU v5 6/8] migration: Implement dirty-limit convergence alg

On Wed, Jun 14, 2023 at 1:50 AM Juan Quintela <quintela@redhat.com> wrote:

~hyman <hyman@git.sr.ht> wrote:
> From: Hyman Huang(黄勇) <yong.huang@smartx.com>

To speed thinkng up, 1-5 are included on next Migration PULL request.

OK, I'll post the next version only contain the last 3 commits.

> Implement dirty-limit convergence algo for live migration,
> which is kind of like auto-converge algo but using dirty-limit
> instead of cpu throttle to make migration convergent.
>
> Enable dirty page limit if dirty_rate_high_cnt greater than 2
> when dirty-limit capability enabled, Disable dirty-limit if
> migration be cancled.

Nit: canceled.

get it.

>
> Note that "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit"
> commands are not allowed during dirty-limit live migration.
>
> Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

> + * Enable dirty-limit to throttle down the guest
> + */
> +static void migration_dirty_limit_guest(void)
> +{
> + static int64_t quota_dirtyrate;

quota_dirtyrate deserves at least a comment.

I guess it means the current quota_dirty_rate that is set, but no clue.

OK. I'll comment it next version.

> + MigrationState *s = migrate_get_current();
> +
> + /*
> + * If dirty limit already enabled and migration parameter
> + * vcpu-dirty-limit untouched.
> + */
> + if (dirtylimit_in_service() &&
> + quota_dirtyrate == s->parameters.vcpu_dirty_limit) {
> + return;
> + }
> +
> + quota_dirtyrate = s->parameters.vcpu_dirty_limit;
> +
> + /* Set or update quota dirty limit */
> + qmp_set_vcpu_dirty_limit(false, -1, quota_dirtyrate, NULL);

Care to explain why do we have to "reset" the quota? Or why we can't
set it when the user issues the command, only when we throttle the guest?

Indeed, -1 is misleading, the first parameter means the set all vcpu a

quota dirtyrate, and the second parameter is meaningless if the first

parameter is false.

The comment will be like this next version?

/* Set all vCPU a quota dirtyrate, note that the second parameter will

be ignored if setting all vCPU for a vm.

> + trace_migration_dirty_limit_guest(quota_dirtyrate);
> +}
> +

Split this patch in two:

a - the logic change
b - the introduction of dirty limit.

Ok, get it.

Old code:

/* During block migration the auto-converge logic incorrectly detects
* that ram migration makes no progress. Avoid this by disabling the
* throttling logic during the bulk phase of block migration. */
if (blk_mig_bulk_active()) {
return;
}

if (migrate_auto_converge()) {
/* The following detection logic can be refined later. For now:
Check to see if the ratio between dirtied bytes and the approx.
amount of bytes that just got transferred since the last time
we were in this routine reaches the threshold. If that happens
twice, start or increase throttling. */

if ((bytes_dirty_period > bytes_dirty_threshold) &&
(++rs->dirty_rate_high_cnt >= 2)) {
trace_migration_throttle();
rs->dirty_rate_high_cnt = 0;
mig_throttle_guest_down(bytes_dirty_period,
bytes_dirty_threshold);
}
}

New code:
/*
* The following detection logic can be refined later. For now:
* Check to see if the ratio between dirtied bytes and the approx.
* amount of bytes that just got transferred since the last time
* we were in this routine reaches the threshold. If that happens
* twice, start or increase throttling.
*/

if ((bytes_dirty_period > bytes_dirty_threshold) &&
(++rs->dirty_rate_high_cnt >= 2)) {
rs->dirty_rate_high_cnt = 0;
/*
* During block migration the auto-converge logic incorrectly detects
* that ram migration makes no progress. Avoid this by disabling the
* throttling logic during the bulk phase of block migration
*/
if (blk_mig_bulk_active()) {
return;
}

if (migrate_auto_converge()) {
trace_migration_throttle();
mig_throttle_guest_down(bytes_dirty_period,
bytes_dirty_threshold);
} else if (migrate_dirty_limit()) {
migration_dirty_limit_guest();
}
}

Questions:

- Why are we changing blk_mig_bulk_active() position?

I think that the old code have it in the right place. Additionally,
you just changefd to this version a couple of patches agon.

Yes, indeed, this modification make no sense, i'll fix it next version.

> int64_t cpu_index,
> Error **errp)
> {
> + MigrationState *ms = migrate_get_current();
> +
> if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> return;
> }
> @@ -453,6 +455,15 @@ void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
> return;
> }
>
> + if (migration_is_running(ms->state) &&
> + (!qemu_thread_is_self(&ms->thread)) &&
> + migrate_dirty_limit() &&
> + dirtylimit_in_service()) {
> + error_setg(errp, "can't cancel dirty page limit while"
> + " migration is running");

Error message is bad or wrong.
You can cancel the dirty page, you ust need to be on the main thread.

Or I am missing something?

Migration, IMHO, shares the same quota dirty rate stored in the global variable

"dirtylimit_state ", if we cancel the dirty limit, it will make the throttle not work

and the migration will be affected.

> + return;
> + }
> +
> dirtylimit_state_lock();
>
> if (has_cpu_index) {
> @@ -488,6 +499,8 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
> uint64_t dirty_rate,
> Error **errp)
> {
> + MigrationState *ms = migrate_get_current();
> +
> if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
> error_setg(errp, "dirty page limit feature requires KVM with"
> " accelerator property 'dirty-ring-size' set'")
> @@ -504,6 +517,15 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
> return;
> }
>
> + if (migration_is_running(ms->state) &&
> + (!qemu_thread_is_self(&ms->thread)) &&
> + migrate_dirty_limit() &&
> + dirtylimit_in_service()) {
> + error_setg(errp, "can't cancel dirty page limit while"
> + " migration is running");
> + return;
> + }

If you use such a complex _expression_ twice, I think that creating a
helper function is a good idea.

Ok, get it

Later, Juan.

Hyman

Best regards

From:	Yong Huang
Subject:	Re: [PATCH QEMU v5 6/8] migration: Implement dirty-limit convergence algo
Date:	Thu, 15 Jun 2023 18:12:57 +0800