qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/3] migration: Stop marking RP bad after shutdown


From: Fabiano Rosas
Subject: Re: [PATCH 1/3] migration: Stop marking RP bad after shutdown
Date: Mon, 31 Jul 2023 18:02:56 -0300

Peter Xu <peterx@redhat.com> writes:

> On Fri, Jul 28, 2023 at 09:15:14AM -0300, Fabiano Rosas wrote:
>> When waiting for the return path (RP) thread to finish, there is
>> really nothing wrong in the RP if the destination end of the migration
>> stops responding, leaving it stuck.
>> 
>> Stop returning an error at that point and leave it to other parts of
>> the code to catch. One such part is the very next routine run by
>> migration_completion() which checks 'to_dst_file' for an error and fails
>> the migration. Another is the RP thread itself when the recvmsg()
>> returns an error.
>> 
>> With this we stop marking RP bad from outside of the thread and can
>> reuse await_return_path_close_on_source() in the next patches to wait
>> on the thread during a paused migration.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/migration.c | 1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 91bba630a8..051067f8c5 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2049,7 +2049,6 @@ static int 
>> await_return_path_close_on_source(MigrationState *ms)
>>           * waiting for the destination.
>>           */
>>          qemu_file_shutdown(ms->rp_state.from_dst_file);
>> -        mark_source_rp_bad(ms);
>>      }
>>      trace_await_return_path_close_on_source_joining();
>>      qemu_thread_join(&ms->rp_state.rp_thread);
>
> The retval of await_return_path_close_on_source() relies on
> ms->rp_state.error.  If mark_source_rp_bad() is dropped, is it possible
> that it'll start to return succeed where it used to return failure?

Yep, as described in the commit message, I think it's ok to do that. The
critical part is doing the shutdown. Other instances of
mark_source_rp_bad() continue existing and we continue returning
rp_state.error.

>
> Maybe not a big deal: I see migration_completion() also has another
> qemu_file_get_error() later to catch errors, but I don't know how solid
> that is.

That is the instance I refer to in the commit message. At
await_return_path_close_on_source() we only call mark_source_rp_bad() if
to_dst_file has an error. That will be caught by this
qemu_file_get_error() anyway.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]