qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggabili


From: Thomas Huth
Subject: Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggability
Date: Mon, 5 Jun 2023 16:07:46 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
Perhaps the most painful of all the GitLab CI failures we see are
the enforced job timeouts:

    "ERROR: Job failed: execution took longer than 1h15m0s seconds"

    https://gitlab.com/qemu-project/qemu/-/jobs/4387047648

when that hits the CI log shows what has *already* run, but figuring
out what was currently running (or rather stuck) is an horrendously
difficult.

The initial meson port disabled the meson test timeouts, in order to
limit the scope for introducing side effects from the port that would
complicate adoption.

Now that the meson port is basically finished we can take advantage of
more of its improved features. It has the ability to set timeouts for
test programs, defaulting to 30 seconds, but overridable per test. This
is further helped by fact that we changed the iotests integration so
that each iotests was a distinct meson test, instead of having one
single giant (slow) test.

We already set overrides for a bunch of tests, but they've not been
kept up2date since we had timeouts disabled. So this series first
updates the timeout overrides such that all tests pass when run in
my test gitlab CI pipeline. Then it enables use of meson timeouts.

We might still hit timeouts due to non-deterministic performance of
gitlab CI runners. So we'll probably have to increase a few more
timeouts in the short term. Fortunately this is going to be massively
easier to diagnose. For example this job during my testing:

    https://gitlab.com/berrange/qemu/-/jobs/4392029495

we can immediately see  the problem tests

Summary of Failures:
   6/252 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test                
TIMEOUT        120.02s   killed by signal 15 SIGTERM
   7/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test          
TIMEOUT        120.03s   killed by signal 15 SIGTERM
  64/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/qom-test                  
TIMEOUT        300.03s   killed by signal 15 SIGTERM

The full meson testlog.txt will show each individual TAP log output,
so we can then see exactly which test case we got stuck on.

NB, the artifacts are missing on the job links above, until this
patch merges:

    https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04668.html

NB, this series sets the migration-test timeout to 5 minutes, which
is only valid if this series is merged to make the migration test
not suck:

   https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg00286.html

without that series, we'll need to set the migration-test timeout to
30 minutes instead.

Daniel P. Berrangé (6):
   qtest: bump min meson timeout to 60 seconds
   qtest: bump migration-test timeout to 5 minutes
   qtest: bump qom-test timeout to 7 minutes
   qtest: bump aspeed_smc-test timeout to 2 minutes
   qtest: bump bios-table-test timeout to 6 minutes
   mtest2make: stop disabling meson test timeouts

  scripts/mtest2make.py   |  3 ++-
  tests/qtest/meson.build | 16 ++++++----------
  2 files changed, 8 insertions(+), 11 deletions(-)

FWIW, I now ran this on my rather old laptop with an --enable-debug
build with "make -j$(nproc) check-qtest" and got these additional
failures (beside the expected migration-test that still needs its
final speedup):

 qtest-aarch64/test-hmp        TIMEOUT   120.07s   killed by signal 15 SIGTERM
 qtest-aarch64/qom-test        TIMEOUT   420.09s   killed by signal 15 SIGTERM
 qtest-arm/qom-test            TIMEOUT   420.10s   killed by signal 15 SIGTERM
 qtest-arm/npcm7xx_pwm-test    TIMEOUT   150.04s   killed by signal 15 SIGTERM
 qtest-ppc64/pxe-test          TIMEOUT    60.01s   killed by signal 15 SIGTERM
 qtest-sparc/prom-env-test     TIMEOUT    60.01s   killed by signal 15 SIGTERM
 qtest-sparc/boot-serial-test  TIMEOUT    60.01s   killed by signal 15 SIGTERM
When I run them manually without the timeout patch, I get these
values:

 qtest-aarch64/test-hmp             OK   168.66s   95 subtests passed
 qtest-aarch64/qom-test             OK   646.37s   94 subtests passed
 qtest-arm/qom-test                 OK   621.64s   89 subtests passed
 qtest-arm/npcm7xx_pwm-test         OK   225.48s   24 subtests passed
 qtest-ppc64/pxe-test               OK    96.95s   2 subtests passed
 qtest-sparc/prom-env-test          OK    95.94s   3 subtests passed
 qtest-sparc/boot-serial-test       OK    92.96s   3 subtests passed

 HTH,
  Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]