qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tests/test-char hangs in CI (Centos7 with check-crypto-old-nettle)


From: Alex Bennée
Subject: tests/test-char hangs in CI (Centos7 with check-crypto-old-nettle)
Date: Tue, 15 Dec 2020 20:31:37 +0000
User-agent: mu4e 1.5.7; emacs 28.0.50

Hi,

Over the last few weeks we've seen a steadily more common failure in
test-char on CI. It seems to only manifest itself on the
check-crypto-old-nettle build set and fails with a message like:

  Running test test-char
  Unexpected error in object_property_try_add() at ../qom/object.c:1219:
  attempt to add duplicate property 'serial-id' to object (type 'container')
  ERROR test-char - too few tests run (expected 38, got 9)
  make: *** [run-test-87] Error 1

This would point the finger at test-char /char/serial

I've managed to re-create the crash on Centos7 (x86_64) with the
following build config:

  '../../configure' '--disable-gcrypt' '--enable-nettle' 
'--target-list=x86_64-softmmu,x86_64-linux-user'

The failure rate is something in the region of 1% which I confirmed by
spamming it with my retry.py script:

  retry.py -n 100 -c -- ./tests/test-char

Amusingly attempts to hit it just concentrating on the /char/serial test
with:

  retry.py -n 1000 -c -- ./tests/test-char -p /char/serial

passed with flying colours. It seems there must be an interaction
between the tests to cause the failure. Attempts to valgrind out any
strangeness were met with valgrind hanging on exit but otherwise
passing. You can avoid the hang in valgrind by commenting out:

--8<---------------cut here---------------start------------->8---
modified   tests/test-char.c
@@ -1434,11 +1434,11 @@ static void char_hotswap_test(void)
     g_free(chr_args);
 }
 
-static SocketAddress tcpaddr = {
-    .type = SOCKET_ADDRESS_TYPE_INET,
-    .u.inet.host = (char *)"127.0.0.1",
-    .u.inet.port = (char *)"0",
-};
+/* static SocketAddress tcpaddr = { */
+/*     .type = SOCKET_ADDRESS_TYPE_INET, */
+/*     .u.inet.host = (char *)"127.0.0.1", */
+/*     .u.inet.port = (char *)"0", */
+/* }; */
 #ifndef WIN32
 static SocketAddress unixaddr = {
     .type = SOCKET_ADDRESS_TYPE_UNIX,
@@ -1534,12 +1534,12 @@ int main(int argc, char **argv)
     g_test_add_data_func("/char/socket/client/dupid-reconnect/" # name, \
                          &client8 ##name, char_socket_client_dupid_test)
 
-    if (has_ipv4) {
-        SOCKET_SERVER_TEST(tcp, &tcpaddr);
-        SOCKET_CLIENT_TEST(tcp, &tcpaddr);
-        g_test_add_data_func("/char/socket/server/two-clients/tcp", &tcpaddr,
-                             char_socket_server_two_clients_test);
-    }
+    /* if (has_ipv4) { */
+    /*     SOCKET_SERVER_TEST(tcp, &tcpaddr); */
+    /*     SOCKET_CLIENT_TEST(tcp, &tcpaddr); */
+    /*     g_test_add_data_func("/char/socket/server/two-clients/tcp", 
&tcpaddr, */
+    /*                          char_socket_server_two_clients_test); */
+    /* } */
 #ifndef WIN32
     SOCKET_SERVER_TEST(unix, &unixaddr);
     SOCKET_CLIENT_TEST(unix, &unixaddr);
--8<---------------cut here---------------end--------------->8---

At that point valgrind doesn't have any smoking guns of corruption but
minor leaks and also it seems a bunch of fd's being used up:

  [alex@centos7 crypto-old-nettle]$ valgrind --tool=memcheck --leak-check=full 
--track-origins=yes --track-fds=yes ./tests/test-char
  ==28263== Memcheck, a memory error detector
  ==28263== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
  ==28263== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
  ==28263== Command: ./tests/test-char
  ==28263==
  /char/null: OK
  /char/invalid: OK
  /char/ringbuf: OK
  /char/mux: OK
  /char/stdio: OK
  /char/pipe: OK
  /char/file: OK
  /char/file-fifo: OK
  /char/udp: OK
  /char/serial: OK
  /char/hotswap: OK
  /char/websocket: OK
  /char/socket/server/mainloop/unix: OK
  /char/socket/server/wait-conn/unix: OK
  /char/socket/server/mainloop-fdpass/unix: OK
  /char/socket/server/wait-conn-fdpass/unix: OK
  /char/socket/server/two-clients/unix: OK
  /char/socket/client/mainloop/unix: OK
  /char/socket/client/wait-conn/unix: OK
  /char/socket/client/mainloop-reconnect/unix: OK
  /char/socket/client/wait-conn-reconnect/unix: OK
  /char/socket/client/mainloop-fdpass/unix: OK
  /char/socket/client/wait-conn-fdpass/unix: OK
  /char/socket/client/reconnect-error/unix: OK
  ==28263==
  ==28263== FILE DESCRIPTORS: 12 open at exit.
  ==28263== Open AF_UNIX socket 12: <unknown>
  ==28263==    at 0x5B9EAE7: socket (syscall-template.S:81)
  ==28263==    by 0x14B1AC: qemu_socket (osdep.c:472)
  ==28263==    by 0x1562FB: unix_connect_saddr (qemu-sockets.c:985)
  ==28263==    by 0x1575E8: socket_connect (qemu-sockets.c:1158)
  ==28263==    by 0x12A1D2: qio_channel_socket_connect_sync 
(channel-socket.c:145)
  ==28263==    by 0x13A7B8: tcp_chr_connect_client_task (char-socket.c:1123)
  ==28263==    by 0x125640: qio_task_thread_worker (task.c:124)
  ==28263==    by 0x14F245: qemu_thread_start (qemu-thread-posix.c:521)
  ==28263==    by 0x588AEA4: start_thread (pthread_create.c:307)
  ==28263==    by 0x5B9D96C: clone (clone.S:111)
  ==28263==
  ==28263== Open AF_UNIX socket 11: <unknown>
  ==28263==    at 0x5B9EAE7: socket (syscall-template.S:81)
  ==28263==    by 0x14B1AC: qemu_socket (osdep.c:472)
  ==28263==    by 0x1562FB: unix_connect_saddr (qemu-sockets.c:985)
  ==28263==    by 0x1575E8: socket_connect (qemu-sockets.c:1158)
  ==28263==    by 0x12A1D2: qio_channel_socket_connect_sync 
(channel-socket.c:145)
  ==28263==    by 0x13A7B8: tcp_chr_connect_client_task (char-socket.c:1123)
  ==28263==    by 0x125640: qio_task_thread_worker (task.c:124)
  ==28263==    by 0x14F245: qemu_thread_start (qemu-thread-posix.c:521)
  ==28263==    by 0x588AEA4: start_thread (pthread_create.c:307)
  ==28263==    by 0x5B9D96C: clone (clone.S:111)
  ==28263==
  ==28263== Open file descriptor 10:
  ==28263==    at 0x5B9DC1D: eventfd (eventfd.c:28)
  ==28263==    by 0x4EC6B06: g_wakeup_new (gwakeup.c:146)
  ==28263==    by 0x4E808B3: g_main_context_new (gmain.c:656)
  ==28263==    by 0x4E842C4: g_get_worker_context (gmain.c:5797)
  ==28263==    by 0x4E84332: ref_unix_signal_handler_unlocked (gmain.c:5205)
  ==28263==    by 0x4E84458: g_child_watch_source_new (gmain.c:5384)
  ==28263==    by 0x4EA5E64: wait_for_child (gtestutils.c:2814)
  ==28263==    by 0x4EA9993: g_test_trap_subprocess (gtestutils.c:3100)
  ==28263==    by 0x11BCB3: char_stdio_test (test-char.c:125)
  ==28263==    by 0x4EA92E0: test_case_run (gtestutils.c:2255)
  ==28263==    by 0x4EA92E0: g_test_run_suite_internal (gtestutils.c:2339)
  ==28263==    by 0x4EA919A: g_test_run_suite_internal (gtestutils.c:2351)
  ==28263==    by 0x4EA947D: g_test_run_suite (gtestutils.c:2426)
  ==28263==
  ==28263== Open file descriptor 8:
  ==28263==    at 0x5B9DC1D: eventfd (eventfd.c:28)
  ==28263==    by 0x4EC6B06: g_wakeup_new (gwakeup.c:146)
  ==28263==    by 0x4E808B3: g_main_context_new (gmain.c:656)
  ==28263==    by 0x4E8098C: g_main_context_default (gmain.c:692)
  ==28263==    by 0x4E818F4: g_source_attach (gmain.c:1194)
  ==28263==    by 0x158D5B: qemu_init_main_loop (main-loop.c:167)
  ==28263==    by 0x118FD7: main (test-char.c:1453)
  ==28263==
  ==28263== Open file descriptor 7:
  ==28263==    at 0x5B9DC1D: eventfd (eventfd.c:28)
  ==28263==    by 0x16C76B: event_notifier_init (event_notifier-posix.c:41)
  ==28263==    by 0x1626A7: aio_context_new (async.c:508)
  ==28263==    by 0x158CF9: qemu_init_main_loop (main-loop.c:159)
  ==28263==    by 0x118FD7: main (test-char.c:1453)
  ==28263==
  ==28263== Open file descriptor 6:
  ==28263==    at 0x5B9DEB7: epoll_create1 (syscall-template.S:81)
  ==28263==    by 0x165B5D: fdmon_epoll_setup (fdmon-epoll.c:151)
  ==28263==    by 0x16269D: aio_context_new (async.c:506)
  ==28263==    by 0x158CF9: qemu_init_main_loop (main-loop.c:159)
  ==28263==    by 0x118FD7: main (test-char.c:1453)
  ==28263==
  ==28263== Open file descriptor 5:
  ==28263==    at 0x5B9DC1D: eventfd (eventfd.c:28)
  ==28263==    by 0x16C76B: event_notifier_init (event_notifier-posix.c:41)
  ==28263==    by 0x1626A7: aio_context_new (async.c:508)
  ==28263==    by 0x1588AF: iohandler_init.part.0 (main-loop.c:551)
  ==28263==    by 0x158C23: iohandler_init (main-loop.c:575)
  ==28263==    by 0x158C23: qemu_set_fd_handler (main-loop.c:572)
  ==28263==    by 0x158CF1: qemu_signal_init (main-loop.c:109)
  ==28263==    by 0x158CF1: qemu_init_main_loop (main-loop.c:154)
  ==28263==    by 0x118FD7: main (test-char.c:1453)
  ==28263==
  ==28263== Open file descriptor 4:
  ==28263==    at 0x5B9DEB7: epoll_create1 (syscall-template.S:81)
  ==28263==    by 0x165B5D: fdmon_epoll_setup (fdmon-epoll.c:151)
  ==28263==    by 0x16269D: aio_context_new (async.c:506)
  ==28263==    by 0x1588AF: iohandler_init.part.0 (main-loop.c:551)
  ==28263==    by 0x158C23: iohandler_init (main-loop.c:575)
  ==28263==    by 0x158C23: qemu_set_fd_handler (main-loop.c:572)
  ==28263==    by 0x158CF1: qemu_signal_init (main-loop.c:109)
  ==28263==    by 0x158CF1: qemu_init_main_loop (main-loop.c:154)
  ==28263==    by 0x118FD7: main (test-char.c:1453)
  ==28263==
  ==28263== Open file descriptor 3:
  ==28263==    at 0x5B97C89: syscall (syscall.S:38)
  ==28263==    by 0x16C0C1: qemu_signalfd (compatfd.c:103)
  ==28263==    by 0x158CC7: qemu_signal_init (main-loop.c:101)
  ==28263==    by 0x158CC7: qemu_init_main_loop (main-loop.c:154)
  ==28263==    by 0x118FD7: main (test-char.c:1453)
  ==28263==
  ==28263== Open file descriptor 2: /dev/pts/1
  ==28263==    <inherited from parent>
  ==28263==
  ==28263== Open file descriptor 1: /dev/pts/1
  ==28263==    <inherited from parent>
  ==28263==
  ==28263== Open file descriptor 0: /dev/pts/1
  ==28263==    <inherited from parent>
  ==28263==
  ==28263==
  ==28263== HEAP SUMMARY:
  ==28263==     in use at exit: 62,568 bytes in 537 blocks
  ==28263==   total heap usage: 5,803 allocs, 5,266 frees, 2,400,984 bytes 
allocated
  ==28263==
  ==28263== 576 bytes in 1 blocks are possibly lost in loss record 352 of 359
  ==28263==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
  ==28263==    by 0x4012784: allocate_dtv (dl-tls.c:317)
  ==28263==    by 0x4012784: _dl_allocate_tls (dl-tls.c:533)
  ==28263==    by 0x588B87B: allocate_stack (allocatestack.c:539)
  ==28263==    by 0x588B87B: pthread_create@@GLIBC_2.2.5 (pthread_create.c:447)
  ==28263==    by 0x1500E5: qemu_thread_create (qemu-thread-posix.c:558)
  ==28263==    by 0x1619D8: rcu_init_complete (rcu.c:379)
  ==28263==    by 0x16FC9C: __libc_csu_init (in 
/home/alex/lsrc/qemu.git/builds/crypto-old-nettle/tests/test-char)
  ==28263==    by 0x5AC14E4: (below main) (libc-start.c:225)
  ==28263==
  ==28263== 576 bytes in 1 blocks are possibly lost in loss record 353 of 359
  ==28263==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
  ==28263==    by 0x4012784: allocate_dtv (dl-tls.c:317)
  ==28263==    by 0x4012784: _dl_allocate_tls (dl-tls.c:533)
  ==28263==    by 0x588B87B: allocate_stack (allocatestack.c:539)
  ==28263==    by 0x588B87B: pthread_create@@GLIBC_2.2.5 (pthread_create.c:447)
  ==28263==    by 0x4EC7C81: g_system_thread_new (gthread-posix.c:1177)
  ==28263==    by 0x4EAA82E: g_thread_new_internal (gthread.c:874)
  ==28263==    by 0x4EAA8E7: g_thread_new (gthread.c:827)
  ==28263==    by 0x4E842E0: g_get_worker_context (gmain.c:5798)
  ==28263==    by 0x4E84332: ref_unix_signal_handler_unlocked (gmain.c:5205)
  ==28263==    by 0x4E84458: g_child_watch_source_new (gmain.c:5384)
  ==28263==    by 0x4EA5E64: wait_for_child (gtestutils.c:2814)
  ==28263==    by 0x4EA9993: g_test_trap_subprocess (gtestutils.c:3100)
  ==28263==    by 0x11BCB3: char_stdio_test (test-char.c:125)
  ==28263==
  ==28263== LEAK SUMMARY:
  ==28263==    definitely lost: 0 bytes in 0 blocks
  ==28263==    indirectly lost: 0 bytes in 0 blocks
  ==28263==      possibly lost: 1,152 bytes in 2 blocks
  ==28263==    still reachable: 61,416 bytes in 535 blocks
  ==28263==         suppressed: 0 bytes in 0 blocks
  ==28263== Reachable blocks (those to which a pointer was found) are not shown.
  ==28263== To see them, rerun with: --leak-check=full --show-leak-kinds=all
  ==28263==
  ==28263== For lists of detected and suppressed errors, rerun with: -s
  ==28263== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

I also have a core dump of the binary when it fails but I suspect it
requires someone with a bit more deftness of QOM to see if anything can
be devised from its guts.

So anyone want to hazard a guess at what's going on?

-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]