qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Seeing qtest assertion failure with 7.1


From: Patrick Venture
Subject: Re: Seeing qtest assertion failure with 7.1
Date: Thu, 8 Sep 2022 08:54:26 -0700



On Wed, Sep 7, 2022 at 10:40 AM Peter Maydell <peter.maydell@linaro.org> wrote:
On Wed, 7 Sept 2022 at 16:39, Patrick Venture <venture@google.com> wrote:
>
> # Start of nvme tests
> # Start of pci-device tests
> # Start of pci-device-tests tests
> # starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-1431.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1431.qmp,id=char0 -mon chardev=char0,mode=control -display none -M virt, -cpu max -drive id=drv0,if=none,file=null-co://,file.read-zeroes=on,format=raw -object memory-backend-ram,id=pmr0,share=on,size=8 -device nvme,addr=04.0,drive=drv0,serial=foo -accel qtest
>
> # ERROR:../../src/qemu/tests/qtest/libqtest.c:338:qtest_init_without_qmp_handshake: assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
> stderr:
> double free or corruption (out)
> socket_accept failed: Resource temporarily unavailable
> **
> ERROR:../../src/qemu/tests/qtest/libqtest.c:338:qtest_init_without_qmp_handshake: assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
> ../../src/qemu/tests/qtest/libqtest.c:165: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)
>
> I'm not seeing this reliably, and we haven't done a lot of digging yet, such as enabling sanitizers, so I'll reply back to this thread with details as I have them.
>
> Has anyone seen this before or something like it?

Have a look in the source at what exactly the assertion
failure in libqtest.c is checking for -- IIRC it's a pretty
basic "did we open a socket fd" one. I think sometimes I
used to see something like this if there's an old stale socket
lying around in the test directory and the randomly generated
socket filename happens to clash with it.

Thanks for the debugging tip! I can't reproduce it at this point. I saw it 2-3 times, and now not at all.  So more than likely it's exactly what you're describing.
 

Everything after that is probably follow-on errors from the
tests not being terribly clean about error handling.

Are you running 'make check' with a -j option for parallel?
(This is supposed to work, and it's the standard way I run
'make check', so if it's flaky we need to fix it, but it
would be interesting to know if the issue repros at -j1.)

Since it's not reproducing reliably -- and I haven't actually seen it since the first few instances (and it was unrelated to those patches in flight), I'll have to sit on further debug until we reproduce it and then I can let you know, but this seems to be flaky at the point where it's hard to detect.
 

-- PMM

reply via email to

[Prev in Thread] Current Thread [Next in Thread]