[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V1 2/3] migration: fix suspended runstate
From: |
Steven Sistare |
Subject: |
Re: [PATCH V1 2/3] migration: fix suspended runstate |
Date: |
Fri, 23 Jun 2023 15:56:42 -0400 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 |
On 6/23/2023 2:25 PM, Steven Sistare wrote:
> On 6/21/2023 4:28 PM, Peter Xu wrote:
>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote:
>>> On 6/20/2023 5:46 PM, Peter Xu wrote:
>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote:
>>>>> Migration of a guest in the suspended state is broken. The incoming
>>>>> migration code automatically tries to wake the guest, which IMO is
>>>>> wrong -- the guest should end migration in the same state it started.
>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which
>>>>> bypasses vm_start(). The guest appears to be in the running state, but
>>>>> it is not.
>>>>>
>>>>> To fix, leave the guest in the suspended state, but call
>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed
>>>>> later, when the client sends a system_wakeup command.
>>>>>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> ---
>>>>> migration/migration.c | 11 ++++-------
>>>>> softmmu/runstate.c | 1 +
>>>>> 2 files changed, 5 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>> index 17b4b47..851fe6d 100644
>>>>> --- a/migration/migration.c
>>>>> +++ b/migration/migration.c
>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void
>>>>> *opaque)
>>>>> vm_start();
>>>>> } else {
>>>>> runstate_set(global_state_get_runstate());
>>>>> + if (runstate_check(RUN_STATE_SUSPENDED)) {
>>>>> + /* Force vm_start to be called later. */
>>>>> + qemu_system_start_on_wakeup_request();
>>>>> + }
>>>>
>>>> Is this really needed, along with patch 1?
>>>>
>>>> I have a very limited knowledge on suspension, so I'm prone to making
>>>> mistakes..
>>>>
>>>> But from what I read this, qemu_system_wakeup_request() (existing one, not
>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread
>>>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in
>>>> the main thread later on after qemu_wakeup_requested() returns true.
>>>
>>> Correct, here:
>>>
>>> if (qemu_wakeup_requested()) {
>>> pause_all_vcpus();
>>> qemu_system_wakeup();
>>> notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>>> wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>>> resume_all_vcpus();
>>> qapi_event_send_wakeup();
>>> }
>>>
>>> However, that is not sufficient, because vm_start() was never called on the
>>> incoming
>>> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among
>>> other things.
>>>
>>>
>>> Without my fixes, it "works" because the outgoing migration automatically
>>> wakes a suspended
>>> guest, which sets the state to running, which is saved in global state:
>>>
>>> void migration_completion(MigrationState *s)
>>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>> global_state_store()
>>>
>>> Then the incoming migration calls vm_start here:
>>>
>>> migration/migration.c
>>> if (!global_state_received() ||
>>> global_state_get_runstate() == RUN_STATE_RUNNING) {
>>> if (autostart) {
>>> vm_start();
>>>
>>> vm_start must be called for correctness.
>>
>> I see. Though I had a feeling that this is still not the right way to do,
>> at least not as clean.
>>
>> One question is, would above work for postcopy when VM is suspended during
>> the switchover?
>
> Good catch, that is broken.
> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh
> and now it works.
>
> if (global_state_get_runstate() == RUN_STATE_RUNNING) {
> if (autostart) {
> vm_start();
> } else {
> runstate_set(RUN_STATE_PAUSED);
> }
> } else {
> runstate_set(global_state_get_runstate());
> if (runstate_check(RUN_STATE_SUSPENDED)) {
> qemu_system_start_on_wakeup_request();
> }
> }
>
>> I think I see your point that vm_start() (mostly vm_prepare_start())
>> contains a bunch of operations that maybe we must have before starting the
>> VM, but then.. should we just make that vm_start() unconditional when
>> loading VM completes? I just don't see anything won't need it (besides
>> -S), even COLO.
>>
>> So I'm wondering about something like this:
>>
>> ===8<===
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque)
>>
>> dirty_bitmap_mig_before_vm_start();
>>
>> - if (!global_state_received() ||
>> - global_state_get_runstate() == RUN_STATE_RUNNING) {
>> - if (autostart) {
>> - vm_start();
>> - } else {
>> - runstate_set(RUN_STATE_PAUSED);
>> - }
>> - } else if (migration_incoming_colo_enabled()) {
>> + if (migration_incoming_colo_enabled()) {
>> migration_incoming_disable_colo();
>> + /* COLO should always have autostart=1 or we can enforce it here */
>> + }
>> +
>> + if (autostart) {
>> + RunState run_state = global_state_get_runstate();
>> vm_start();
>
> This will resume the guest for a brief time, against the user's wishes. Not
> OK IMO.
>
>> + switch (run_state) {
>> + case RUN_STATE_RUNNING:
>> + break;
>> + case RUN_STATE_SUSPENDED:
>> + qemu_system_suspend();
>
> qemu_system_suspend will not cause the guest to suspend. It is qemu's
> response after
> the guest initiates a suspend.
>
>> + break;
>> + default:
>> + runstate_set(run_state);
>> + break;
>> + }
>> } else {
>> - runstate_set(global_state_get_runstate());
>> + runstate_set(RUN_STATE_PAUSED);
>> }
>> ===8<===
>>
>> IIUC this can drop qemu_system_start_on_wakeup_request() along with the
>> other global var. Would something like it work for us?
>
> Afraid not. start_on_wake is the only correct solution I can think of.
>
> - Steve
Actually, the start_on_wake concept is indeed required, but I can hide the
details in softmmu/cpus.s:
static bool vm_never_started = true;
void vm_start(void)
{
if (!vm_prepare_start(false)) {
vm_never_started = false;
resume_all_vcpus();
}
}
void vm_wakeup(void)
{
if (vm_never_started) {
vm_start();
} else {
runstate_set(RUN_STATE_RUNNING);
}
}
... and qemu_system_wakeup_request() calls vm_wakeup().
Now the migration code simply sets the run state (eg SUSPENDED), no more
calling qemu_system_start_on_wakeup_request.
- Steve
- [PATCH V1 0/3] fix migration of suspended runstate, Steve Sistare, 2023/06/15
- [PATCH V1 1/3] vl: start on wakeup request, Steve Sistare, 2023/06/15
- [PATCH V1 2/3] migration: fix suspended runstate, Steve Sistare, 2023/06/15
- Re: [PATCH V1 2/3] migration: fix suspended runstate, Peter Xu, 2023/06/20
- Re: [PATCH V1 2/3] migration: fix suspended runstate, Steven Sistare, 2023/06/21
- Re: [PATCH V1 2/3] migration: fix suspended runstate, Peter Xu, 2023/06/21
- Re: [PATCH V1 2/3] migration: fix suspended runstate, Steven Sistare, 2023/06/23
- Re: [PATCH V1 2/3] migration: fix suspended runstate,
Steven Sistare <=
- Re: [PATCH V1 2/3] migration: fix suspended runstate, Peter Xu, 2023/06/26
- Re: [PATCH V1 2/3] migration: fix suspended runstate, Peter Xu, 2023/06/26
- Re: [PATCH V1 2/3] migration: fix suspended runstate, Steven Sistare, 2023/06/30
[PATCH V1 3/3] tests/qtest: live migration suspended state, Steve Sistare, 2023/06/15