qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] migrate/multifd: fix coredump when the multifd thread cleanu


From: chenyuhui (A)
Subject: Re: [PATCH] migrate/multifd: fix coredump when the multifd thread cleanup
Date: Mon, 26 Jun 2023 21:16:14 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0

On 2023/6/21 22:22, Fabiano Rosas wrote:
> Jianguo Zhang via <qemu-devel@nongnu.org> writes:
> 
>> From: Yuhui Chen <chenyuhui5@huawei.com>
>>
>> There is a coredump while trying to destroy mutex when
>> p->running is false but p->mutex is not unlock.
>> Make sure all mutexes has been released before destroy them.
>>
>> Signed-off-by: Yuhui Chen <chenyuhui5@huawei.com>
>> ---
>>  migration/multifd.c | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index b7ad7002e0..7dcdb2d3a0 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -523,9 +523,7 @@ void multifd_save_cleanup(void)
>>      for (i = 0; i < migrate_multifd_channels(); i++) {
>>          MultiFDSendParams *p = &multifd_send_state->params[i];
>>  
>> -        if (p->running) {
> 
> The need for this flag is dubious IMO. Commit 10351fbad1
> ("migration/multifd: Join all multifd threads in order to avoid leaks")
> already moved the other join outside of it. If we figure out another way
> to deal with the sem_sync lockup we could probably remove this
> altogether.


I've seen this commit 10351fbad1, and it's seems to have the same
problem in function multifd_save_cleanup.

So that may my patch only need to modify multifd_save_cleanup.

______


On 2023/6/21 21:24, Peter Xu wrote:
> On Wed, Jun 21, 2023 at 04:18:26PM +0800, Jianguo Zhang via wrote:
>> From: Yuhui Chen<chenyuhui5@huawei.com>
>>
>> There is a coredump while trying to destroy mutex when
>> p->running is false but p->mutex is not unlock.
>> Make sure all mutexes has been released before destroy them.
>
> It'll be nice to add a backtrace of the coredump here, and also copy
> maintainer (Juan Quintela, copied now).
>

The attachment is coredump, and my code is base on
https://github.com/qemu/qemu.git tag v6.2.0.

How reproducible:
1、And sleep time to produce p->running is false but p->mutex is
 not unlock.

From: Yuhui Chen <chenyuhui5@huawei.com>
Date: Mon, 26 Jun 2023 14:24:35 +0800
Subject: [DEBUG][PATCH] And sleep time to produce p->running is false but 
p->mutex is
 not unlock.

---
 migration/multifd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 7c9deb1921..09a7b0748a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];

+        sleep(2);
         if (p->running) {
             qemu_thread_join(&p->thread);
         }
@@ -719,6 +720,7 @@ out:

     qemu_mutex_lock(&p->mutex);
     p->running = false;
+    sleep(20);
     qemu_mutex_unlock(&p->mutex);

     rcu_unregister_thread();
-- 
2.21.0.windows.1

2、Start a vm then do migration with --parallel-connections

>>
>> Signed-off-by: Yuhui Chen <chenyuhui5@huawei.com>
>> ---
>>  migration/multifd.c | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index b7ad7002e0..7dcdb2d3a0 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -523,9 +523,7 @@ void multifd_save_cleanup(void)
>>      for (i = 0; i < migrate_multifd_channels(); i++) {
>>          MultiFDSendParams *p = &multifd_send_state->params[i];
>>
>> -        if (p->running) {
>> -            qemu_thread_join(&p->thread);
>> -        }
>> +        qemu_thread_join(&p->thread);
>
> I'm not sure whether this will always work, e.g. when migration fails early
> before creating multifd threads?
>

Theoretically, all threads are created here (otherwise it would already have
thrown an exception in the preceding function multifd_recv_terminate_threads),
so they all need to join regardless the p->running state.

>>      }
>>      for (i = 0; i < migrate_multifd_channels(); i++) {
>>          MultiFDSendParams *p = &multifd_send_state->params[i];
>> @@ -1040,8 +1038,8 @@ int multifd_load_cleanup(Error **errp)
>>               * however try to wakeup it without harm in cleanup phase.
>>               */
>>              qemu_sem_post(&p->sem_sync);
>> -            qemu_thread_join(&p->thread);
>>          }
>> +        qemu_thread_join(&p->thread);
>>      }
>>      for (i = 0; i < migrate_multifd_channels(); i++) {
>>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>> --
>> 2.21.0.windows.1
>>
>>
>



Attachment: core.zip
Description: Zip compressed data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]