qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during re


From: Peter Xu
Subject: Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery
Date: Tue, 12 Sep 2023 18:16:45 -0400

On Tue, Sep 12, 2023 at 04:05:27PM -0400, Peter Xu wrote:
> Thanks for contributing the test case!
> 
> Do you want me to pick this patch up (with modifications) and repost
> together with this series?  It'll also work if you want to send a separate
> test patch.  Let me know!

It turns out I found more bug when I was reworking that test case based on
yours.  E.g., currently we'll crash dest qemu if we really fail during
recovery, because we miss:

diff --git a/migration/savevm.c b/migration/savevm.c
index bb3e99194c..422406e0ee 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2723,7 +2723,8 @@ static bool 
postcopy_pause_incoming(MigrationIncomingState *mis)
         qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
     }
 
-    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+    /* Current state can be either ACTIVE or RECOVER */
+    migrate_set_state(&mis->state, mis->state,
                       MIGRATION_STATUS_POSTCOPY_PAUSED);
 
     /* Notify the fault thread for the invalidated file handle */

So in double failure case we'll not set RECOVER to PAUSED, and we'll crash
right afterwards, as we'll skip the semaphore:

    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {  <--- not true, 
continue
        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
    }

Now within the new test case I am 100% sure I can kick both sides into
RECOVER state (one trick still needed along the way; the test patch will
tell soon), then kick them back, then proceed with a successful migration.

Let me just repost everything with the new test case.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]