qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inscrutable CI jobs (avocado & Travis s390 check-tcg)


From: Thomas Huth
Subject: Re: Inscrutable CI jobs (avocado & Travis s390 check-tcg)
Date: Fri, 23 Sep 2022 09:47:01 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0

On 23/09/2022 09.28, Daniel P. Berrangé wrote:
On Thu, Sep 22, 2022 at 03:04:12PM -0400, Stefan Hajnoczi wrote:
QEMU's avocado and Travis s390x check-tcg CI jobs fail often and I don't
know why. I think it's due to timeouts but maybe there is something
buried in the logs that I missed.

I waste time skimming through logs when merging qemu.git pull requests
and electricity is wasted on tests that don't produce useful pass/fail
output.

Here are two recent examples:
https://gitlab.com/qemu-project/qemu/-/jobs/3070754718
https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/583629583

If there are real test failures then the test output needs to be
improved so people can identify failures.

If the tests are timing out then they need to be split up and/or reduced
in duration. BTW, if it's a timeout, why are we using an internal
timeout instead of letting CI mark the job as timed out?

Any other ideas for improving these CI jobs?

The avocado job there does show the errors, but the summary at the
end leaves something to be desired. At first glance it looked like
everything passed because it says "ERROR 0" and that's what caught
my eye. Took a long time to notice the 'INTERRUPT 5' bit is actually
just an error state too.  I don't understand why it has to have so
many different ways of saying the same thing:

   RESULTS    : PASS 14 | ERROR 0 | FAIL 0 | SKIP 37 | WARN 0 | INTERRUPT 5 | 
CANCEL 136


   "ERROR", "FAIL" and "INTERRUPT" are all just the same thing

   "SKIP" and "CANCEL" are just the same thing

I'm sure there was some reason for these different terms, but IMHO they
are actively unhelpful.

For example I see no justiable reason for the choice of SKIP vs CANCEL
in these two messages:

  (173/192) 
tests/avocado/virtiofs_submounts.py:VirtiofsSubmountsTest.test_pre_launch_set_up:  SKIP: 
sudo -n required, but "sudo -n true" failed: [Errno 2] No such file or 
directory: 'sudo'

  (183/192) 
tests/avocado/x86_cpu_model_versions.py:X86CPUModelAliases.test_4_1_alias:  
CANCEL: No QEMU binary defined or found in the build tree (0.00 s)

It would be clearer to understand the summary as:

  RESULTS: PASS 14 | ERROR 5 | SKIP 173 | WARN 0

I'd also like to see it repeat the error messages for the failed
tests at the end, so you don't have to search back up through the
huge log to find them.


On the TCG tests we see

imeout --foreground 90  /home/travis/build/qemu-project/qemu/build/qemu-s390x  
noexec >  noexec.out

make[1]: *** [../Makefile.target:158: run-noexec] Error 1

make[1]: Leaving directory 
'/home/travis/build/qemu-project/qemu/build/tests/tcg/s390x-linux-user'

make: *** [/home/travis/build/qemu-project/qemu/tests/Makefile.include:60: 
run-tcg-tests-s390x-linux-user] Error 2


I presume that indicates the 'noexec' test failed, but we have zero
info.

I think this is the bug that will be fixed by Ilya's patch here:

 https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg02756.html

But I agree, it is unfortunate that the output is not available. Looking at this on my s390x box:

$ cat tests/tcg/s390x-linux-user/noexec.out
[ RUN      ] fallthrough
[       OK ]
[ RUN      ] jump
[  FAILED  ] unexpected SEGV

so there is an indication of what's going wrong in there indeed.

Alex, would it be possible to change the tcg test harness to dump the .out file of failing tests?

 Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]