[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 0/5] migration: Fix the BDRV_O_INACTIVE assertion
From: |
Fabiano Rosas |
Subject: |
[PATCH 0/5] migration: Fix the BDRV_O_INACTIVE assertion |
Date: |
Mon, 25 Nov 2024 11:46:07 -0300 |
At this point everyone is probably familiar with the assertion:
bdrv_inactivate_recurse: Assertion `!(bs->open_flags &
BDRV_O_INACTIVE)' failed.
The issue is that at the end of migration, the block layer is
deactivated to release the locks on the disks so that in a shared
storage scenario the migration destination can take over. Trying to
deactivate twice leads to the above assertion.
The reason deactivation is happening twice is due to the migration
capability "late-block-activate", which delays activation until the
destination VM receives a qmp_cont command. The reasoning for that cap
is:
Activating the block devices causes the locks to be taken on
the backing file. If we're running with -S and the destination libvirt
hasn't started the destination with 'cont', it's expecting the locks are
still untaken.
Don't activate the block devices if we're not going to autostart the VM;
'cont' already will do that anyway. This change is tied to the new
migration capability 'late-block-activate' that defaults to off, keeping
the old behaviour by default.
bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
However, the expected qmp_cont command never happens, because the user
may never decide to continue the VM and instead attempt another
migration while the VM is paused.
Fix this by assuming that the qmp_migrate command having been issued
is enough indication that it is now ok to take the locks again
(similarly to qmp_cont) and call bdrv_activate_all() at that
moment. This is the least invasive fix that won't require us to add
new qmp commands and go change libvirt, etc.
I'm also copying the block list (for obvious reasons, but also)
because Vladimir pointed out that asserts of this kind might not only
happen during migration. I know of at least
bdrv_co_write_req_prepare() which will assert if reached when a guest
is paused after a migration that used late-block-activate.
Patches 1-3 are just shuffling code;
Patch 4 is the fix;
Patch 5 is the test. To reproduce the issue revert patch 4 and run:
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p \
/x86_64/migration/precopy/ping-pong/late-block-activate
References:
https://gitlab.com/qemu-project/qemu/-/issues/686
https://gitlab.com/qemu-project/qemu/-/issues/2395
https://lore.kernel.org/r/20240924125611.664315-1-andrey.drobyshev@virtuozzo.com
CI run: https://gitlab.com/farosas/qemu/-/pipelines/1556902330
Fabiano Rosas (5):
tests/qtest/migration: Move more code under only_target
tests/qtest/migration: Don't use hardcoded strings for -serial
tests/qtest/migration: Support cleaning up only one side of migration
migration: Activate block devices if VM is paused when migrating
tests/qtest/migration: Test successive migrations
migration/migration.c | 19 +++
tests/qtest/migration-helpers.c | 8 +
tests/qtest/migration-helpers.h | 2 +
tests/qtest/migration-test.c | 252 +++++++++++++++++++++++++-------
4 files changed, 226 insertions(+), 55 deletions(-)
base-commit: 134b443512825bed401b6e141447b8cdc22d2efe
--
2.35.3
- [PATCH 0/5] migration: Fix the BDRV_O_INACTIVE assertion,
Fabiano Rosas <=
- [PATCH 1/5] tests/qtest/migration: Move more code under only_target, Fabiano Rosas, 2024/11/25
- [PATCH 2/5] tests/qtest/migration: Don't use hardcoded strings for -serial, Fabiano Rosas, 2024/11/25
- [PATCH 4/5] migration: Activate block devices if VM is paused when migrating, Fabiano Rosas, 2024/11/25
- [PATCH 3/5] tests/qtest/migration: Support cleaning up only one side of migration, Fabiano Rosas, 2024/11/25
- [PATCH 5/5] tests/qtest/migration: Test successive migrations, Fabiano Rosas, 2024/11/25