qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V1 2/3] migration: fix suspended runstate


From: Peter Xu
Subject: Re: [PATCH V1 2/3] migration: fix suspended runstate
Date: Wed, 21 Jun 2023 16:28:35 -0400

On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
> On 6/20/2023 5:46 PM, Peter Xu wrote:
> > On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
> >> Migration of a guest in the suspended state is broken.  The incoming
> >> migration code automatically tries to wake the guest, which IMO is
> >> wrong -- the guest should end migration in the same state it started.
> >> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
> >> bypasses vm_start().  The guest appears to be in the running state, but
> >> it is not.
> >>
> >> To fix, leave the guest in the suspended state, but call
> >> qemu_system_start_on_wakeup_request() so the guest is properly resumed
> >> later, when the client sends a system_wakeup command.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >> ---
> >>  migration/migration.c | 11 ++++-------
> >>  softmmu/runstate.c    |  1 +
> >>  2 files changed, 5 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index 17b4b47..851fe6d 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void 
> >> *opaque)
> >>          vm_start();
> >>      } else {
> >>          runstate_set(global_state_get_runstate());
> >> +        if (runstate_check(RUN_STATE_SUSPENDED)) {
> >> +            /* Force vm_start to be called later. */
> >> +            qemu_system_start_on_wakeup_request();
> >> +        }
> > 
> > Is this really needed, along with patch 1?
> > 
> > I have a very limited knowledge on suspension, so I'm prone to making
> > mistakes..
> > 
> > But from what I read this, qemu_system_wakeup_request() (existing one, not
> > after patch 1 applied) will setup wakeup_reason and kick the main thread
> > using qemu_notify_event().  Then IIUC the e.g. vcpu wakeups will be done in
> > the main thread later on after qemu_wakeup_requested() returns true.
> 
> Correct, here:
> 
>     if (qemu_wakeup_requested()) {
>         pause_all_vcpus();
>         qemu_system_wakeup();
>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>         resume_all_vcpus();
>         qapi_event_send_wakeup();
>     }
> 
> However, that is not sufficient, because vm_start() was never called on the 
> incoming
> side.  vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among 
> other things.
> 
> 
> Without my fixes, it "works" because the outgoing migration automatically 
> wakes a suspended
> guest, which sets the state to running, which is saved in global state:
> 
>     void migration_completion(MigrationState *s)
>         qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>         global_state_store()
> 
> Then the incoming migration calls vm_start here:
> 
>     migration/migration.c
>         if (!global_state_received() ||
>             global_state_get_runstate() == RUN_STATE_RUNNING) {
>             if (autostart) {
>                 vm_start();
> 
> vm_start must be called for correctness.

I see.  Though I had a feeling that this is still not the right way to do,
at least not as clean.

One question is, would above work for postcopy when VM is suspended during
the switchover?

I think I see your point that vm_start() (mostly vm_prepare_start())
contains a bunch of operations that maybe we must have before starting the
VM, but then.. should we just make that vm_start() unconditional when
loading VM completes?  I just don't see anything won't need it (besides
-S), even COLO.

So I'm wondering about something like this:

===8<===
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
 
     dirty_bitmap_mig_before_vm_start();
 
-    if (!global_state_received() ||
-        global_state_get_runstate() == RUN_STATE_RUNNING) {
-        if (autostart) {
-            vm_start();
-        } else {
-            runstate_set(RUN_STATE_PAUSED);
-        }
-    } else if (migration_incoming_colo_enabled()) {
+    if (migration_incoming_colo_enabled()) {
         migration_incoming_disable_colo();
+        /* COLO should always have autostart=1 or we can enforce it here */
+    }
+
+    if (autostart) {
+        RunState run_state = global_state_get_runstate();
         vm_start();
+        switch (run_state) {
+        case RUN_STATE_RUNNING:
+            break;
+        case RUN_STATE_SUSPENDED:
+            qemu_system_suspend();
+            break;
+        default:
+            runstate_set(run_state);
+            break;
+        }
     } else {
-        runstate_set(global_state_get_runstate());
+        runstate_set(RUN_STATE_PAUSED);
     }
===8<===

IIUC this can drop qemu_system_start_on_wakeup_request() along with the
other global var.  Would something like it work for us?

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]