qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migra


From: Ivan Ren
Subject: Re: [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migration leads destination hung forever
Date: Wed, 24 Jul 2019 15:01:07 +0800

ping for review

problem still exist in qemu-4.1.0-rc2

Threads:  24 total,   0 running,  24 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,
 0.0 st
KiB Mem : 39434172+total, 36798950+free,  2948836 used, 23403388 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 38926476+avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
286108 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:01.19
qemu-system-x86
286109 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00
qemu-system-x86
286113 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 IO
mon_iothread
286114 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
0/KVM
286115 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
1/KVM
286116 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
2/KVM
286117 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
3/KVM
286118 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
4/KVM
286119 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
5/KVM
286120 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
6/KVM
286121 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
7/KVM
286122 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
8/KVM
286123 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
9/KVM
286124 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
10/KVM
286125 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
11/KVM
286126 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
12/KVM
286127 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
13/KVM
286128 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
14/KVM
286129 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
15/KVM
286131 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00
vnc_worker
286132 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.01
multifdrecv_0
286133 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.01
multifdrecv_1
286134 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.01
multifdrecv_2
286136 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.01
multifdrecv_3

Thread 2 (Thread 0x7f9d075fe700 (LWP 286136)):
#0  0x00007fbd67123a0b in do_futex_wait.constprop.1 () from
/lib64/libpthread.so.0
#1  0x00007fbd67123a9f in __new_sem_wait_slow.constprop.0 () from
/lib64/libpthread.so.0
#2  0x00007fbd67123b3b in sem_wait@@GLIBC_2.2.5 () from
/lib64/libpthread.so.0
#3  0x00005582236e2514 in qemu_sem_wait (sem=sem@entry=0x558226364dc8) at
util/qemu-thread-posix.c:319
#4  0x00005582232efb67 in multifd_recv_thread
(opaque=opaque@entry=0x558226364d30)
at /qemu-4.1.0-rc2/migration/ram.c:1356
#5  0x00005582236e1b26 in qemu_thread_start (args=<optimized out>) at
util/qemu-thread-posix.c:502
#6  0x00007fbd6711de25 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fbd66e4a35d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fbd6c7a4cc0 (LWP 286108)):
#0  0x00007fbd6711ef57 in pthread_join () from /lib64/libpthread.so.0
#1  0x00005582236e28ff in qemu_thread_join (thread=thread@entry=0x558226364b00)
at util/qemu-thread-posix.c:570
#2  0x00005582232f36b4 in multifd_load_cleanup (errp=errp@entry=0x7fbd341fff58)
at /qemu-4.1.0-rc2/migration/ram.c:1259
#3  0x00005582235765a9 in process_incoming_migration_co (opaque=<optimized
out>) at migration/migration.c:510
#4  0x00005582236f4c0a in coroutine_trampoline (i0=<optimized out>,
i1=<optimized out>) at util/coroutine-ucontext.c:115
#5  0x00007fbd66d98d40 in ?? () from /lib64/libc.so.6
#6  0x00007ffec0bf24d0 in ?? ()
#7  0x0000000000000000 in ?? ()

On Tue, Jun 25, 2019 at 9:18 PM Ivan Ren <address@hidden> wrote:

> When migrate_cancel a multifd migration, if run sequence like this:
>
>         [source]                              [destination]
>
> multifd_send_sync_main[finish]
>                                     multifd_recv_thread wait &p->sem_sync
> shutdown to_dst_file
>                                     detect error from_src_file
> send  RAM_SAVE_FLAG_EOS[fail]       [no chance to run
> multifd_recv_sync_main]
>                                     multifd_load_cleanup
>                                     join multifd receive thread forever
>
> will lead destination qemu hung at following stack:
>
> pthread_join
> qemu_thread_join
> multifd_load_cleanup
> process_incoming_migration_co
> coroutine_trampoline
>
> Signed-off-by: Ivan Ren <address@hidden>
> ---
>  migration/ram.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index e4eb9c441f..504c8ccb03 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1291,6 +1291,11 @@ int multifd_load_cleanup(Error **errp)
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>
>          if (p->running) {
> +            /*
> +             * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle
> code,
> +             * however try to wakeup it without harm in cleanup phase.
> +             */
> +            qemu_sem_post(&p->sem_sync);
>              qemu_thread_join(&p->thread);
>          }
>          object_unref(OBJECT(p->c));
> --
> 2.17.2 (Apple Git-113)
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]