qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggabili


From: Daniel P . Berrangé
Subject: Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggability
Date: Mon, 5 Jun 2023 15:14:35 +0100
User-agent: Mutt/2.2.9 (2022-11-12)

On Mon, Jun 05, 2023 at 04:07:46PM +0200, Thomas Huth wrote:
> On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> > Perhaps the most painful of all the GitLab CI failures we see are
> > the enforced job timeouts:
> > 
> >     "ERROR: Job failed: execution took longer than 1h15m0s seconds"
> > 
> >     https://gitlab.com/qemu-project/qemu/-/jobs/4387047648
> > 
> > when that hits the CI log shows what has *already* run, but figuring
> > out what was currently running (or rather stuck) is an horrendously
> > difficult.
> > 
> > The initial meson port disabled the meson test timeouts, in order to
> > limit the scope for introducing side effects from the port that would
> > complicate adoption.
> > 
> > Now that the meson port is basically finished we can take advantage of
> > more of its improved features. It has the ability to set timeouts for
> > test programs, defaulting to 30 seconds, but overridable per test. This
> > is further helped by fact that we changed the iotests integration so
> > that each iotests was a distinct meson test, instead of having one
> > single giant (slow) test.
> > 
> > We already set overrides for a bunch of tests, but they've not been
> > kept up2date since we had timeouts disabled. So this series first
> > updates the timeout overrides such that all tests pass when run in
> > my test gitlab CI pipeline. Then it enables use of meson timeouts.
> > 
> > We might still hit timeouts due to non-deterministic performance of
> > gitlab CI runners. So we'll probably have to increase a few more
> > timeouts in the short term. Fortunately this is going to be massively
> > easier to diagnose. For example this job during my testing:
> > 
> >     https://gitlab.com/berrange/qemu/-/jobs/4392029495
> > 
> > we can immediately see  the problem tests
> > 
> > Summary of Failures:
> >    6/252 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test                
> > TIMEOUT        120.02s   killed by signal 15 SIGTERM
> >    7/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test          
> > TIMEOUT        120.03s   killed by signal 15 SIGTERM
> >   64/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/qom-test                  
> > TIMEOUT        300.03s   killed by signal 15 SIGTERM
> > 
> > The full meson testlog.txt will show each individual TAP log output,
> > so we can then see exactly which test case we got stuck on.
> > 
> > NB, the artifacts are missing on the job links above, until this
> > patch merges:
> > 
> >     https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04668.html
> > 
> > NB, this series sets the migration-test timeout to 5 minutes, which
> > is only valid if this series is merged to make the migration test
> > not suck:
> > 
> >    https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg00286.html
> > 
> > without that series, we'll need to set the migration-test timeout to
> > 30 minutes instead.
> > 
> > Daniel P. Berrangé (6):
> >    qtest: bump min meson timeout to 60 seconds
> >    qtest: bump migration-test timeout to 5 minutes
> >    qtest: bump qom-test timeout to 7 minutes
> >    qtest: bump aspeed_smc-test timeout to 2 minutes
> >    qtest: bump bios-table-test timeout to 6 minutes
> >    mtest2make: stop disabling meson test timeouts
> > 
> >   scripts/mtest2make.py   |  3 ++-
> >   tests/qtest/meson.build | 16 ++++++----------
> >   2 files changed, 8 insertions(+), 11 deletions(-)
> 
> FWIW, I now ran this on my rather old laptop with an --enable-debug
> build with "make -j$(nproc) check-qtest" and got these additional
> failures (beside the expected migration-test that still needs its
> final speedup):
> 
>  qtest-aarch64/test-hmp        TIMEOUT   120.07s   killed by signal 15 SIGTERM
>  qtest-aarch64/qom-test        TIMEOUT   420.09s   killed by signal 15 SIGTERM
>  qtest-arm/qom-test            TIMEOUT   420.10s   killed by signal 15 SIGTERM
>  qtest-arm/npcm7xx_pwm-test    TIMEOUT   150.04s   killed by signal 15 SIGTERM
>  qtest-ppc64/pxe-test          TIMEOUT    60.01s   killed by signal 15 SIGTERM
>  qtest-sparc/prom-env-test     TIMEOUT    60.01s   killed by signal 15 SIGTERM
>  qtest-sparc/boot-serial-test  TIMEOUT    60.01s   killed by signal 15 SIGTERM

Did you see any others in the 45-60 second time window, as those would
be candidates for increases too - don't want to have things right below
the 60 second cutoff ?

> When I run them manually without the timeout patch, I get these
> values:
> 
>  qtest-aarch64/test-hmp             OK   168.66s   95 subtests passed
>  qtest-aarch64/qom-test             OK   646.37s   94 subtests passed
>  qtest-arm/qom-test                 OK   621.64s   89 subtests passed
>  qtest-arm/npcm7xx_pwm-test         OK   225.48s   24 subtests passed
>  qtest-ppc64/pxe-test               OK    96.95s   2 subtests passed
>  qtest-sparc/prom-env-test          OK    95.94s   3 subtests passed
>  qtest-sparc/boot-serial-test       OK    92.96s   3 subtests passed
> 
>  HTH,
>   Thomas
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]