qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hang in migration-test (s390 host)


From: Peter Maydell
Subject: Re: hang in migration-test (s390 host)
Date: Thu, 28 Apr 2022 16:08:20 +0100

On Fri, 25 Mar 2022 at 08:04, Juan Quintela <quintela@redhat.com> wrote:
>
> Laurent Vivier <lvivier@redhat.com> wrote:
> > Perhaps Juan or Thomas can help too (added to cc)
> >
> > Is this a regression?
> > It looks like a bug in QEMU as it doesn't move from cancelling to cancelled.

I had a repeat of this hang (same machine), so here's the debug
info I wasn't able to gather the first time round.

> >> [Inferior 1 (process 2771497) detached]
> >> ===========================================================
> >> PROCESS: 2772862
> >> gitlab-+ 2772862 2771497 99 Mar23 ?        18:45:28 ./qemu-system-i386
> >> -qtest unix:/tmp/qtest-2771497.sock -qtest-log /dev/null -chardev
> >> socket,path=/tmp/qtest-2771497.qmp,id=char0 -mon
> >> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> >> source,debug-threads=on -m 150M -serial
> >> file:/tmp/migration-test-f6G71L/src_serial -drive
> >> file=/tmp/migration-test-f6G71L/bootsect,format=raw -accel qtest
>
> Source of migration thread.
>
> >> [New LWP 2772864]
> >> [New LWP 2772866]
> >> [New LWP 2772867]
> >> [New LWP 2772915]
> >> [Thread debugging using libthread_db enabled]
> >> Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
> >> 0x000003ff94ef1c9c in __ppoll (fds=0x2aa179a6f30, nfds=5,
> >> timeout=<optimized out>, timeout@entry=0x3fff557b588,
> >> sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
> >> 44      ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.
> >> Thread 5 (Thread 0x3ff1b7f6900 (LWP 2772915)):
> >> #0  futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0,
> >> expected=0, futex_word=0x2aa1881f634) at
> >> ../sysdeps/nptl/futex-internal.h:320
> >> #1  do_futex_wait (sem=sem@entry=0x2aa1881f630, abstime=0x0,
> >> clockid=0) at sem_waitcommon.c:112
> >> #2  0x000003ff95011870 in __new_sem_wait_slow
> >> (sem=sem@entry=0x2aa1881f630, abstime=0x0, clockid=0) at
> >> sem_waitcommon.c:184
> >> #3  0x000003ff9501190e in __new_sem_wait (sem=sem@entry=0x2aa1881f630)
> >> at sem_wait.c:42
> >> #4  0x000002aa165b1416 in qemu_sem_wait (sem=sem@entry=0x2aa1881f630)
> >> at ../util/qemu-thread-posix.c:358
> >> #5  0x000002aa16023434 in multifd_send_sync_main (f=0x2aa17993760) at
> >> ../migration/multifd.c:610
> >> #6  0x000002aa162a8f18 in ram_save_iterate (f=0x2aa17993760,
> >> opaque=<optimized out>) at ../migration/ram.c:3049
> >> #7  0x000002aa1602bafc in qemu_savevm_state_iterate (f=0x2aa17993760,
> >> postcopy=<optimized out>) at ../migration/savevm.c:1296
> >> #8  0x000002aa1601fe4e in migration_iteration_run (s=0x2aa17748010) at
> >> ../migration/migration.c:3607
> >> #9  migration_thread (opaque=opaque@entry=0x2aa17748010) at
> >> ../migration/migration.c:3838
> >> #10 0x000002aa165b05c2 in qemu_thread_start (args=<optimized out>) at
> >> ../util/qemu-thread-posix.c:556
> >> #11 0x000003ff95007e66 in start_thread (arg=0x3ff1b7f6900) at
> >> pthread_create.c:477
> >> #12 0x000003ff94efcbf6 in thread_start () at
> >> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
>
> Migration main thread in multifd_send_sync_main(), waiting for the
> semaphore in
>
>     for (i = 0; i < migrate_multifd_channels(); i++) {
>         MultiFDSendParams *p = &multifd_send_state->params[i];
>
>         trace_multifd_send_sync_main_wait(p->id);
>         qemu_sem_wait(&p->sem_sync);
>     }
>
> Knowing the value of i would be great.  See the end of the email, I
> think it is going to be 0.

gdb says i is 1. Possibly the compiler has enthusiastically
reordered the 'i++' above the qemu_sem_wait(), though.
I tried to get gdb to tell me the value of migrate_multifd_channels(),
but that was a mistake because gdb's attempt to execute code in
the debuggee to answer that question did not work and left it
in a state where it was broken and I had to kill it.

Is there something we can put into either QEMU or the test
case that will let us get some better information when this
happens again ?

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]