[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 4/5] migration: Activate block devices if VM is paused when m
From: |
Andrey Drobyshev |
Subject: |
Re: [PATCH 4/5] migration: Activate block devices if VM is paused when migrating |
Date: |
Wed, 27 Nov 2024 19:40:39 +0200 |
User-agent: |
Mozilla Thunderbird |
On 11/25/24 7:07 PM, Peter Xu wrote:
> On Mon, Nov 25, 2024 at 11:46:11AM -0300, Fabiano Rosas wrote:
>> Currently a VM that has been target of a migration using
>> late-block-activate will crash at the end of a new migration (with it
>> as source) when releasing ownership of the disks due to the VM having
>> never taken ownership of the disks in the first place.
>>
>> The issue is that late-block-activate expects a qmp_continue command
>> to be issued at some point on the destination VM after the migration
>> finishes. If the user decides to never continue the VM, but instead
>> issue a new migration, then bdrv_activate_all() will never be called
>> and the assert will be reached:
>>
>> bdrv_inactivate_recurse: Assertion `!(bs->open_flags &
>> BDRV_O_INACTIVE)' failed.
>>
>> Fix the issue by checking at the start of migration if the VM is
>> paused and call bdrv_activate_all() before migrating. Even if the
>> late-block-activate capability is not in play or if the VM has been
>> paused manually, there is no harm calling that function again.
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> migration/migration.c | 19 +++++++++++++++++++
>> 1 file changed, 19 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index aedf7f0751..26af30137b 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2029,6 +2029,25 @@ static bool migrate_prepare(MigrationState *s, bool
>> resume, Error **errp)
>> return false;
>> }
>>
>> + /*
>> + * The VM might have been target of a previous migration. If it
>> + * was in the paused state then nothing will have required the
>> + * block layer to be activated. Do it now to ensure this QEMU
>> + * instance owns the disk locks.
>> + */
>> + if (!resume && runstate_check(RUN_STATE_PAUSED)) {
>
> I hope this will cover all the cases that QEMU could overlook.. or I'm not
> sure whether we could invoke bdrv_activate_all() unconditionally, as it
> looks like safe to be used if all disks are active already.
>
> I wished we don't need to bother with disk activation status at all,
> because IIUC migration could work all fine even if all disks are inactivate
> when preparing migration.. hence such change always looks like a workaround
> of a separate issue.
>
>> + Error *local_err = NULL;
>> +
>> + g_assert(bql_locked());
>> +
>> + bdrv_activate_all(&local_err);
>> + if (local_err) {
>> + error_propagate(errp, local_err);
>> + return false;
>> + }
>> + s->block_inactive = false;
>
> This var so far was only used within one migration iteration, and the var
> was only set in migration_completion_precopy() so far. Now we're resetting
> it upfront of a migration. I'm not 100% sure if it's needed, or should be
> put somewhere else.
>
This variable is unconditionally set in migration_completion_precopy()
and used as a flag for whether or not we should be deactivating disks in
qemu_savevm_state_complete_precopy():
> /*
>
> * Inactivate disks except in COLO, and track that we have done so in
> order
> * to remember to reactivate them if migration fails or is cancelled.
>
> */
>
> s->block_inactive = !migrate_colo();
>
> migration_rate_set(RATE_LIMIT_DISABLED);
>
>
> ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
>
> s->block_inactive);
Ideally we'd like to take our paused state into account right here and
skip inactivation. However at this point during the 2nd migration (in
paused state) our current_run_state == RUN_STATE_FINISH_MIGRATE. So if
we truly wanted to skip unnecessary bdrv_activate_all(), we'd have to
remember our paused state somewhere earlier on the call stack and pass
an additional flag for that. Personally I don't think this is any
cleaner than just blatantly calling bdrv_activate_all().
> In general, I saw the mention of other places that may also try to
> invalidate disks that used to be invalidated. If that's the case, I wish
> we don't need to touch migration code at all, but instead if block layer
> can cope with "invalidate on top of invalidated disks" it'll be perfect.
>
This sounds similar to my initial very naive fix, which we decided not
to take:
https://lists.nongnu.org/archive/html/qemu-devel/2024-09/msg04254.html
>> + }
>> +
>> return true;
>> }
>>
>> --
>> 2.35.3
>>
>
- [PATCH 0/5] migration: Fix the BDRV_O_INACTIVE assertion, Fabiano Rosas, 2024/11/25
- [PATCH 1/5] tests/qtest/migration: Move more code under only_target, Fabiano Rosas, 2024/11/25
- [PATCH 4/5] migration: Activate block devices if VM is paused when migrating, Fabiano Rosas, 2024/11/25
- [PATCH 5/5] tests/qtest/migration: Test successive migrations, Fabiano Rosas, 2024/11/25
- [PATCH 3/5] tests/qtest/migration: Support cleaning up only one side of migration, Fabiano Rosas, 2024/11/25
- [PATCH 2/5] tests/qtest/migration: Don't use hardcoded strings for -serial, Fabiano Rosas, 2024/11/25