[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggabili
From: |
Daniel P . Berrangé |
Subject: |
Re: [PATCH 0/6] tests: enable meson test timeouts to improve debuggability |
Date: |
Mon, 5 Jun 2023 15:14:35 +0100 |
User-agent: |
Mutt/2.2.9 (2022-11-12) |
On Mon, Jun 05, 2023 at 04:07:46PM +0200, Thomas Huth wrote:
> On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> > Perhaps the most painful of all the GitLab CI failures we see are
> > the enforced job timeouts:
> >
> > "ERROR: Job failed: execution took longer than 1h15m0s seconds"
> >
> > https://gitlab.com/qemu-project/qemu/-/jobs/4387047648
> >
> > when that hits the CI log shows what has *already* run, but figuring
> > out what was currently running (or rather stuck) is an horrendously
> > difficult.
> >
> > The initial meson port disabled the meson test timeouts, in order to
> > limit the scope for introducing side effects from the port that would
> > complicate adoption.
> >
> > Now that the meson port is basically finished we can take advantage of
> > more of its improved features. It has the ability to set timeouts for
> > test programs, defaulting to 30 seconds, but overridable per test. This
> > is further helped by fact that we changed the iotests integration so
> > that each iotests was a distinct meson test, instead of having one
> > single giant (slow) test.
> >
> > We already set overrides for a bunch of tests, but they've not been
> > kept up2date since we had timeouts disabled. So this series first
> > updates the timeout overrides such that all tests pass when run in
> > my test gitlab CI pipeline. Then it enables use of meson timeouts.
> >
> > We might still hit timeouts due to non-deterministic performance of
> > gitlab CI runners. So we'll probably have to increase a few more
> > timeouts in the short term. Fortunately this is going to be massively
> > easier to diagnose. For example this job during my testing:
> >
> > https://gitlab.com/berrange/qemu/-/jobs/4392029495
> >
> > we can immediately see the problem tests
> >
> > Summary of Failures:
> > 6/252 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test
> > TIMEOUT 120.02s killed by signal 15 SIGTERM
> > 7/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> > TIMEOUT 120.03s killed by signal 15 SIGTERM
> > 64/252 qemu:qtest+qtest-aarch64 / qtest-aarch64/qom-test
> > TIMEOUT 300.03s killed by signal 15 SIGTERM
> >
> > The full meson testlog.txt will show each individual TAP log output,
> > so we can then see exactly which test case we got stuck on.
> >
> > NB, the artifacts are missing on the job links above, until this
> > patch merges:
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04668.html
> >
> > NB, this series sets the migration-test timeout to 5 minutes, which
> > is only valid if this series is merged to make the migration test
> > not suck:
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg00286.html
> >
> > without that series, we'll need to set the migration-test timeout to
> > 30 minutes instead.
> >
> > Daniel P. Berrangé (6):
> > qtest: bump min meson timeout to 60 seconds
> > qtest: bump migration-test timeout to 5 minutes
> > qtest: bump qom-test timeout to 7 minutes
> > qtest: bump aspeed_smc-test timeout to 2 minutes
> > qtest: bump bios-table-test timeout to 6 minutes
> > mtest2make: stop disabling meson test timeouts
> >
> > scripts/mtest2make.py | 3 ++-
> > tests/qtest/meson.build | 16 ++++++----------
> > 2 files changed, 8 insertions(+), 11 deletions(-)
>
> FWIW, I now ran this on my rather old laptop with an --enable-debug
> build with "make -j$(nproc) check-qtest" and got these additional
> failures (beside the expected migration-test that still needs its
> final speedup):
>
> qtest-aarch64/test-hmp TIMEOUT 120.07s killed by signal 15 SIGTERM
> qtest-aarch64/qom-test TIMEOUT 420.09s killed by signal 15 SIGTERM
> qtest-arm/qom-test TIMEOUT 420.10s killed by signal 15 SIGTERM
> qtest-arm/npcm7xx_pwm-test TIMEOUT 150.04s killed by signal 15 SIGTERM
> qtest-ppc64/pxe-test TIMEOUT 60.01s killed by signal 15 SIGTERM
> qtest-sparc/prom-env-test TIMEOUT 60.01s killed by signal 15 SIGTERM
> qtest-sparc/boot-serial-test TIMEOUT 60.01s killed by signal 15 SIGTERM
Did you see any others in the 45-60 second time window, as those would
be candidates for increases too - don't want to have things right below
the 60 second cutoff ?
> When I run them manually without the timeout patch, I get these
> values:
>
> qtest-aarch64/test-hmp OK 168.66s 95 subtests passed
> qtest-aarch64/qom-test OK 646.37s 94 subtests passed
> qtest-arm/qom-test OK 621.64s 89 subtests passed
> qtest-arm/npcm7xx_pwm-test OK 225.48s 24 subtests passed
> qtest-ppc64/pxe-test OK 96.95s 2 subtests passed
> qtest-sparc/prom-env-test OK 95.94s 3 subtests passed
> qtest-sparc/boot-serial-test OK 92.96s 3 subtests passed
>
> HTH,
> Thomas
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
- Re: [PATCH 4/6] qtest: bump aspeed_smc-test timeout to 2 minutes, (continued)