[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during re
From: |
Peter Xu |
Subject: |
Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery |
Date: |
Tue, 12 Sep 2023 18:16:45 -0400 |
On Tue, Sep 12, 2023 at 04:05:27PM -0400, Peter Xu wrote:
> Thanks for contributing the test case!
>
> Do you want me to pick this patch up (with modifications) and repost
> together with this series? It'll also work if you want to send a separate
> test patch. Let me know!
It turns out I found more bug when I was reworking that test case based on
yours. E.g., currently we'll crash dest qemu if we really fail during
recovery, because we miss:
diff --git a/migration/savevm.c b/migration/savevm.c
index bb3e99194c..422406e0ee 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2723,7 +2723,8 @@ static bool
postcopy_pause_incoming(MigrationIncomingState *mis)
qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
}
- migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+ /* Current state can be either ACTIVE or RECOVER */
+ migrate_set_state(&mis->state, mis->state,
MIGRATION_STATUS_POSTCOPY_PAUSED);
/* Notify the fault thread for the invalidated file handle */
So in double failure case we'll not set RECOVER to PAUSED, and we'll crash
right afterwards, as we'll skip the semaphore:
while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { <--- not true,
continue
qemu_sem_wait(&mis->postcopy_pause_sem_dst);
}
Now within the new test case I am 100% sure I can kick both sides into
RECOVER state (one trick still needed along the way; the test patch will
tell soon), then kick them back, then proceed with a successful migration.
Let me just repost everything with the new test case.
Thanks,
--
Peter Xu