qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Remaining CI failures


From: Alex Bennée
Subject: [Qemu-devel] Remaining CI failures
Date: Fri, 11 Jan 2019 19:10:07 +0000
User-agent: mu4e 1.1.0; emacs 26.1.91

Hi,

So trying to narrow down the remaining failures in the CI system. There
is one with a patch in flight (use g_usleep instead of sleep) but there
remains two failure modes, both erratic.

tests/qht-par:

I can trigger this on my dev machine with a gprof enabled build:

  # QEMU configure log Fri Jan 11 14:10:45 GMT 2019
  # Configured with: './configure' '--disable-tools' '--disable-docs' 
'--enable-gprof' '--enable-gcov'

I only seem to be able to trigger it when running via the wrapper in the
make system:

  retry.py -n 30 --invert make check-tests/test-qht-par

Eventually this crashes with:

  ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 
0)

Leaving a core dump for the child:

  Core was generated by `tests/qht-bench -R -S0.1 -D10000 -N1 -n 2 -u 20 -d 1'
  (gdb) info thread
    Id   Target Id         Frame
  * 1    Thread 0x7ffff6a7e700 (LWP 15473) 0x000055555557c306 in 
call_rcu_thread (opaque=0x0) at util/rcu.c:255
    2    Thread 0x7ffff7fbe780 (LWP 15472) 0x00005555555b8d50 in 
gcov_read_words ()
  (gdb) bt
  #0  __mcount_internal (frompc=<optimised out>, selfpc=93824992383630) at 
mcount.c:98
  #1  0x00007ffff6e15e24 in mcount () at ../sysdeps/x86_64/_mcount.S:51
  #2  0x000055555557928e in qemu_event_reset (ev=0x3cc692b8d452f400) at 
util/qemu-thread-posix.c:408
  #3  0x000055555557c306 in call_rcu_thread (opaque=0x0) at util/rcu.c:255
  #4  0x0000555555579630 in qemu_thread_start (args=0x555555808080) at 
util/qemu-thread-posix.c:502
  #5  0x00007ffff70e96db in start_thread (arg=0x7ffff6a7e700) at 
pthread_create.c:463
  #6  0x00007ffff6e1288f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  (gdb) thread 2
  [Switching to thread 2 (Thread 0x7ffff7fbe780 (LWP 15472))]
  #0  0x00005555555b8d50 in gcov_read_words ()
  (gdb) bt
  #0  0x00005555555b8d50 in gcov_read_words ()
  #1  0x00005555555b9453 in __gcov_read_summary ()
  #2  0x00005555555ba461 in gcov_do_dump ()
  #3  0x00005555555bab62 in __gcov_exit ()
  #4  0x00005555555b8c22 in _GLOBAL__sub_D_00100_1_json_lexer_init () at 
qobject/json-lexer.c:365
  #5  0x00007ffff7de5b73 in _dl_fini () at dl-fini.c:138
  #6  0x00007ffff6d34041 in __run_exit_handlers (status=0, listp=0x7ffff70dc718 
<__exit_funcs>, address@hidden, address@hidden) at
  exit.c:108
  #7  0x00007ffff6d3413a in __GI_exit (status=<optimised out>) at exit.c:139
  #8  0x00007ffff6d12b9e in __libc_start_main (main=0x555555575d50 <main>, 
argc=11, argv=0x7fffffffdf78, init=<optimised out>, fini=<optimised out>, 
rtld_fini=<optimised out>, stack_end=0x7fffffffdf68) at ../csu/libc-start.c:344
  #9  0x0000555555573c1a in _start ()

To trigger the second failure I had to run on a limited Xenial machine
(16.04, 2 cores, 8Gb RAM) again with gprof build:

  # QEMU configure log Thu 10 Jan 22:22:52 GMT 2019
  # Configured with: './configure' '--enable-gprof' '--enable-gcov' 
'--disable-pie' 
'--target-list=aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu'

Running it like Travis does:

  retry.py -n 40 --invert -- make -j 3 check V=1

It eventually fails with:

  PASS: tests/test-hmp
  make: write error: stdout

It's hard to tell from the output what was running that failed. So far
I've managed to get the following information out of execsnoop:

  qemu-system-x86  1345   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest 
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display 
none -S -M pc-i440fx-4.0
  sh               1350   1332     0 /bin/sh -c exec 
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait 
-qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1350   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest 
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display 
none -S -M pc-q35-3.1
  sh               1356   1332     0 /bin/sh -c exec 
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait 
-qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1356   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest 
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display 
none -S -M pc-i440fx-3.1
  sh               1361   1332     0 /bin/sh -c exec 
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait 
-qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1361   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest 
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display 
none -S -M pc-q35-4.0
  sh               1366   1332     0 /bin/sh -c exec 
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait 
-qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1366   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest 
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display 
none -S -M none -m 2
  tset             1370   2129     0 /usr/bin/tset -c
  git              1374   1373     0 /usr/bin/git rev-parse --git-dir
  git              1376   1375     0 /usr/bin/git symbolic-ref HEAD
  git              1378   1377     0 /usr/bin/git config 
branch.testing/next.remote

So even though tests/test-hmp has nominally passed I think it has to be
one of the x86_64 tests unless there is something that was triggered a
lot longer ago finally crashing out somehow.

I did run the make under strace to see who is using O_NONBLOCK but even
after filtering out a bunch of stuff it seems to be quite embedded:

 ag "O_NONBLOCK" check.strace  | ag -v "/sys" | ag -v "/dev" | grep -v ".git" | 
wc -l
 2449

Anyway I thought it would be worth dumping my current debug state to the
list in case anyone has any bright ideas about whats going on or fancies
some weekend debugging.

The retry.py is just a hacky script I use, I'm sure everyone else has
something similar. See https://github.com/stsquad/retry

--
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]