qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: runaway avocado


From: John Snow
Subject: Re: runaway avocado
Date: Mon, 7 Dec 2020 15:45:56 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1

On 10/26/20 8:28 PM, Cleber Rosa wrote:
On Mon, Oct 26, 2020 at 11:43:36PM +0100, Philippe Mathieu-Daudé wrote:
Cc'ing avocado-devel@

On 10/26/20 11:35 PM, Peter Maydell wrote:
So, I somehow ended up with this process still running on my
local machine after a (probably failed) 'make check-acceptance':

petmay01 13710 99.7  3.7 2313448 1235780 pts/16 Sl  16:10 378:00
./qemu-system-aarch64 -display none -vga none -chardev
socket,id=mon,path=/var/tmp/tmp5szft2yi/qemu-13290-monitor.sock -mon
chardev=mon,mode=control -machine virt -chardev
socket,id=console,path=/var/tmp/tmp5szft2yi/qemu-13290-console.sock,server,nowait
-serial chardev:console -icount
shift=7,rr=record,rrfile=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin,rrsnapshot=init
-net none -drive
file=/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/disk.qcow2,if=none
-kernel 
/home/petmay01/avocado/data/cache/by_location/a00ac4ae676ef0322126abd2f7d38f50cc9cbc95/vmlinuz
-cpu cortex-a53

and it was continuing to log to a deleted file
/var/tmp/avocado_iv8dehpo/avocado_job_w9efukj5/32-tests_acceptance_reverse_debugging.py_ReverseDebugging_AArch64.test_aarch64_virt/replay.bin

which was steadily eating my disk space and got up to nearly 100GB
in used disk (invisible to du, of course, since it was an unlinked
file) before I finally figured out what was going on and killed it
about six hours later...


Ouch!

Any suggestions for how we might improve the robustness of the
relevant test ?


While this test may be less robust/reliable than others, the core
issue is that the automatic shutdown of the QEMU "vms" can be
improved.  My best guess is that this specific test ended in ERROR,
and (or because?) the tearDown() method failed to end these processes.

All tests can be improved at once by adding a second, even more
forceful round of shutdown.  Currently the process gets, in the worst
case scenario, a SIGKILL.

But, in addition to that, an upper layer above the test could be given
the responsibility to look for and clean up resouces initiated by a
test.  The Avocado job has hooks for running callbacks right before
its own process exits, but, with the new Avocado architecture (AKA "N(ext)
Runner") this should probably be implemented as async cleanup actions
that begin right after a test ends.

I'll give the "second more forceful round of shutdown" approach some
and testing, and in addition to that, open an issue to track the upper
layer resource cleanup on Avocado.


machine.py should have a timeout that it adheres to, unless it was disabled explicitly -- then I guess it can't help you.

--js




reply via email to

[Prev in Thread] Current Thread [Next in Thread]