qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] QEMU Gating CI


From: Cleber Rosa
Subject: Re: [RFC] QEMU Gating CI
Date: Fri, 6 Dec 2019 14:03:36 -0500
User-agent: Mutt/1.12.1 (2019-06-15)

On Tue, Dec 03, 2019 at 02:07:32PM +0000, Alex Bennée wrote:
> 
> Cleber Rosa <address@hidden> writes:
> 
> > RFC: QEMU Gating CI
> > ===================
> >
> > This RFC attempts to address most of the issues described in
> > "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> > QEMU CI as we enter 4.0"[2].
> >
> > The general approach is one to minimize the infrastructure maintenance
> > and development burden, leveraging as much as possible "other people's"
> > infrastructure and code.  GitLab's CI/CD platform is the most relevant
> > component dealt with here.
> >
> > Problem Statement
> > -----------------
> >
> > The following is copied verbatim from Peter Maydell's write up[1]:
> >
> > "A gating CI is a prerequisite to having a multi-maintainer model of
> > merging. By having a common set of tests that are run prior to a merge
> > you do not rely on who is currently doing merging duties having access
> > to the current set of test machines."
> >
> > This is of a very simplified view of the problem that I'd like to break
> > down even further into the following key points:
> >
> >  * Common set of tests
> >  * Pre-merge ("prior to a merge")
> >  * Access to the current set of test machines
> >  * Multi-maintainer model
> >
> > Common set of tests
> > ~~~~~~~~~~~~~~~~~~~
> >
> > Before we delve any further, let's make it clear that a "common set of
> > tests" is really a "dynamic common set of tests".  My point is that a
> > set of tests in QEMU may include or exclude different tests depending
> > on the environment.
> >
> > The exact tests that will be executed may differ depending on the
> > environment, including:
> >
> >  * Hardware
> >  * Operating system
> >  * Build configuration
> >  * Environment variables
> >
> > In the "State of QEMU CI as we enter 4.0" Alex Bennée listed some of
> > those "common set of tests":
> >
> >  * check
> 
> Check encompasses a subset of the other checks - currently:
> 
>  - check-unit
>  - check-qtest
>  - check-block
>
> The thing that stops other groups of tests being included is generally
> are they solid on all the various hw/os/config/env setups you describe.
> For example check-tcg currently fails gloriously on non-x86 with docker
> enabled as it tries to get all the cross compiler images working.
>

Right.

> >  * check-tcg
> >  * check-softfloat
> >  * check-block
> >  * check-acceptance
> >
> > While Peter mentions that most of his checks are limited to:
> >
> >  * check
> >  * check-tcg
> >
> > Our current inability to quickly identify a faulty test from test
> > execution results (and specially in remote environments), and act upon
> > it (say quickly disable it on a given host platform), makes me believe
> > that it's fair to start a gating CI implementation that uses this
> > rather coarse granularity.
> >
> > Another benefit is a close or even a 1:1 relationship between a common
> > test set and an entry in the CI configuration.  For instance, the
> > "check" common test set would map to a "make check" command in a
> > "script:" YAML entry.
> >
> > To exemplify my point, if one specific test run as part of "check-tcg"
> > is found to be faulty on a specific job (say on a specific OS), the
> > entire "check-tcg" test set may be disabled as a CI-level maintenance
> > action.
> 
> This would in this example eliminate practically all emulation testing
> apart from the very minimal boot-codes that get spun up by the various
> qtest migration tests. And of course the longer a group of tests is
> disabled the larger the window for additional regressions to get in.
> 
> It may be a reasonable approach but it's not without consequence.
>

Thanks for the insights.  I agree that there are tradeoffs, and I bet
that I'm speculating a lot, and there's just to much to be learned here.

My main point in proposing this very crude rule, though, is trying to
address the operational difficulties when such a Gating CI grows
beyond a single *machine* and *job* maintainer.  A practical example
may help.  Let's consider the following jobs are initially (on phase
1) active and *gating*:

 * test-qtest-ubuntu19.04-x86_64
 * test-qtest-ubuntu19.04-aarch64
 * test-qtest-ubuntu19.04-s390x

Then, because the model has proven successful, a new job that has
already being running for a while with successful results, but with no
influence to gating, is added to the gating group.  This job is being
run on a machine that is managed by a different maintainer:

 * test-qtest-centos8.0-ppc64le

After some time, the test-qtest-centos8.0-ppc64le job starts to fail,
with seemingly no relation to recently merged code.  From a CI
management perspective, disabling the job completely is reasonable if:

 * the machine seems to be faulty
 * ppc64le machine maintainer is unresponsive
 * there's no mechanism to disable a portion of the job (such as an
   specific test)
 * bug has been found but there's no short-term fix

This doesn't mean that a lot of test will be eliminated for good.
Unless the machine is faulty, it's expected that the tests will
continue to run, but not with Gating powers.  Also, it's expected that
the same (or similar) tests will be running on other machines/jobs.

IMO, it can be actually the opposite, when "skip test y on platform x"
conditions hidden in test code can survive a lot longer than a
disabled job/machine with a frustrated (and engaged) maintainer trying
to get it back to a "green" status, and then to a reliable status for
a given time so that it can be considered a gating job again.

> > Of course a follow up action to deal with the specific test
> > is required, probably in the form of a Launchpad bug and patches
> > dealing with the issue, but without necessarily a CI related angle to
> > it.
> >
> > If/when test result presentation and control mechanism evolve, we may
> > feel confident and go into finer grained granularity.  For instance, a
> > mechanism for disabling nothing but "tests/migration-test" on a given
> > environment would be possible and desirable from a CI management
> > level.
> 
> The migration tests have found regressions although the problem has
> generally been they were intermittent failures and hard to reproduce
> locally. The last one took a few weeks of grinding to reproduce and get
> patches together.
>

Right.  So I believe we are in sync with the nature of the problem,
that is, that some tests would benefit from individually being pulled
from specific jobs until a permanent solution can be applied to them.

At the same time, if we can't do that (see the conditions that may
render us unable to do it), it would be fair to remove a CI job from
being gating.

> > Pre-merge
> > ~~~~~~~~~
> >
> > The natural way to have pre-merge CI jobs in GitLab is to send "Merge
> > Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
> > MR comes from individual contributors, usually the authors of the
> > changes themselves.  It's my understanding that the current maintainer
> > model employed in QEMU will *not* change at this time, meaning that
> > code contributions and reviews will continue to happen on the mailing
> > list.  A maintainer then, having collected a number of patches, would
> > submit a MR either in addition or in substitution to the Pull Requests
> > sent to the mailing list.
> >
> > "Pipelines for Merged Results"[4] is a very important feature to
> > support the multi-maintainer model, and looks in practice, similar to
> > Peter's "staging" branch approach, with an "automatic refresh" of the
> > target branch.  It can give a maintainer extra confidence that a MR
> > will play nicely with the updated status of the target branch.  It's
> > my understanding that it should be the "key to the gates".  A minor
> > note is that conflicts are still possible in a multi-maintainer model
> > if there are more than one person doing the merges.
> >
> > A worthy point is that the GitLab web UI is not the only way to create
> > a Merge Request, but a rich set of APIs are available[5].  This is
> > interesting for many reasons, and maybe some of Peter's
> > "apply-pullreq"[6] actions (such as bad UTF8 or bogus qemu-devel email
> > addresses checks could be made earlier) as part of a
> > "send-mergereq"-like script, bringing conformance earlier on the merge
> > process, at the MR creation stage.
> >
> > Note: It's possible to have CI jobs definition that are specific to
> > MR, allowing generic non-MR jobs to be kept on the default
> > configuration.  This can be used so individual contributors continue
> > to leverage some of the "free" (shared) runner made available on
> > gitlab.com.
> >
> > Multi-maintainer model
> > ~~~~~~~~~~~~~~~~~~~~~~
> >
> > The previous section already introduced some of the proposed workflow
> > that can enable such a multi-maintainer model.  With a Gating CI
> > system, though, it will be natural to have a smaller "Mean time
> > between (CI) failures", simply because of the expected increased
> > number of systems and checks.  A lot of countermeasures have to be
> > employed to keep that MTBF in check.
> >
> > For once, it's imperative that the maintainers for such systems and
> > jobs are clearly defined and readily accessible.  Either the same
> > MAINTAINERS file or a more suitable variation of such data should be
> > defined before activating the *gating* rules.  This would allow a
> > routing to request the attention of the maintainer responsible.
> >
> > In case of unresposive maintainers, or any other condition that
> > renders and keeps one or more CI jobs failing for a given previously
> > established amount of time, the job can be demoted with an
> > "allow_failure" configuration[7].  Once such a change is commited, the
> > path to promotion would be just the same as in a newly added job
> > definition.
> >
> > Note: In a future phase we can evaluate the creation of rules that
> > look at changed paths in a MR (similar to "F:" entries on MAINTAINERS)
> > and the execution of specific CI jobs, which would be the
> > responsibility of a given maintainer[8].
> >
> > Access to the current set of test machines
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > When compared to the various CI systems and services already being
> > employed in the QEMU project, this is the most striking difference in
> > the proposed model.  Instead of relying on shared/public/free
> > resources, this proposal also deals with privately owned and
> > operated machines.
> >
> > Even though the QEMU project operates with great cooperation, it's
> > crucial to define clear boundaries when it comes to machine access.
> > Restricted access to machines is important because:
> >
> >  * The results of jobs are many times directly tied to the setup and
> >    status of machines.  Even "soft" changes such as removing or updating
> >    packages can introduce failures in jobs (this is greatly minimized
> >    but not completely eliminated when using containers or VMs).
> >    Updating firmware or changing its settings are also examples of
> >    changes that may change the outcome of jobs.
> >
> >  * If maintainers will be accounted for the status of the jobs defined
> >    to run on specific machines, they must be sure of the status of the
> >    machines.
> >
> >  * Machines need regular monitoring and will receive required
> >    maintainance actions which can cause job regressions.
> >
> > Thus, there needs to be one clear way for machines to be *used* for
> > running jobs sent by different maintainers, while still prohibiting
> > any other privileged action that can cause permanent change to the
> > machine.  The GitLab agent (gitlab-runner) is designed to do just
> > that, and defining what will be excuted in a job (in a given system)
> > should be all that's generally allowed.  The job definition itself,
> > will of course be subject to code review before a maintainer decides
> > to send a MR containing such new or updated job definitions.
> >
> > Still related to machine maintanance, it's highly desirable for jobs
> > tied to specific host machines to be introduced alongside with
> > documentation and/or scripts that can replicate the machine setup.  If
> > the machine setup steps can be easily and reliably reproduced, then:
> >
> >  * Other people may help to debug issues and regressions if they
> >    happen to have the same hardware available
> >
> >  * Other people may provide more machines to run the same types of
> >    jobs
> >
> >  * If a machine maintainer goes MIA, it'd be easier to find another
> >    maintainer
> >
> > GitLab Jobs and Pipelines
> > -------------------------
> >
> > GitLab CI is built around two major concepts: jobs and pipelines.  The
> > current GitLab CI configuration in QEMU uses jobs only (or putting it
> > another way, all jobs in a single pipeline stage).  Consider the
> > folowing job definition[9]:
> >
> >    build-tci:
> >     script:
> >     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x 
> > x86_64"
> >     - ./configure --enable-tcg-interpreter
> >          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; 
> > done)"
> >     - make -j2
> >     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
> >     - for tg in $TARGETS ; do
> >         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
> >         ./tests/boot-serial-test || exit 1 ;
> >         ./tests/cdrom-test || exit 1 ;
> >       done
> >     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
> >     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test 
> > -m slow
> >
> > All the lines under "script" are performed sequentially.  It should be
> > clear that there's the possibility of breaking this down into multiple
> > stages, so that a build happens first, and then "common set of tests"
> > run in parallel.  Using the example above, it would look something
> > like:
> >
> >    +---------------+------------------------+
> >    |  BUILD STAGE  |        TEST STAGE      |
> >    +---------------+------------------------+
> >    |   +-------+   |  +------------------+  |
> >    |   | build |   |  | boot-serial-test |  |
> >    |   +-------+   |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | cdrom-test       |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | x86_64-pxe-test  |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | s390x-pxe-test   |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    +---------------+------------------------+
> >
> > Of course it would be silly to break down that job into smaller jobs that
> > would run individual tests like "boot-serial-test" or "cdrom-test".  Still,
> > the pipeline approach is valid because:
> >
> >  * Common set of tests would run in parallel, giving a quicker result
> >    turnaround
> 
> check-unit is a good candidate for parallel tests. The others depends -
> I've recently turned most make check's back to -j 1 on travis because
> it's a real pain to see what test has hung when other tests keep
> running.
>

Agreed.  Running tests in parallel while keeping/presenting traceable
results is a real problem, specially in a remote environment such as
in a CI.

FIY, slightly off-topic and Avocado specific: Avocado keeps each test
result in a separate directory and log file, which helps with that.
I'm bridging the parallel test runner (introduced a few releases back)
to the existing result/reporting infrastructure.  My goal is to have
that running the acceptance tests shortly.

> >
> >  * It's easier to determine to possible nature of the problem with
> >    just the basic CI job status
> >
> >  * Different maintainers could be defined for different "common set of
> >    tests", and again by leveraging the basic CI job status, automation
> >    for directed notification can be implemented
> >
> > In the following example, "check-block" maintainers could be left
> > undisturbed with failures in the "check-acceptance" job:
> >
> >    +---------------+------------------------+
> >    |  BUILD STAGE  |        TEST STAGE      |
> >    +---------------+------------------------+
> >    |   +-------+   |  +------------------+  |
> >    |   | build |   |  | check-block      |  |
> >    |   +-------+   |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | check-acceptance |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    +---------------+------------------------+
> >
> > The same logic applies for test sets for different targets.  For
> > instance, combining the two previous examples, there could different
> > maintainers defined for the different jobs on the test stage:
> >
> >    +---------------+------------------------+
> >    |  BUILD STAGE  |        TEST STAGE      |
> >    +---------------+------------------------+
> >    |   +-------+   |  +------------------+  |
> >    |   | build |   |  | x86_64-block     |  |
> >    |   +-------+   |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | x86_64-acceptance|  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | s390x-block      |  |
> >    |               |  +------------------+  |
> >    |               |                        |
> >    |               |  +------------------+  |
> >    |               |  | s390x-acceptance |  |
> >    |               |  +------------------+  |
> >    +---------------+------------------------+
> >
> > Current limitations for a multi-stage pipeline
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Because it's assumed that each job will happen in an isolated and
> > independent execution environment, jobs must explicitly define the
> > resources that will be shared between stages.  GitLab will make sure
> > the same source code revision will be available on all jobs
> > automatically.  Additionaly, GitLab supports the concept of artifacts.
> > By defining artifacts in the "build" stage, jobs in the "test" stage
> > can expect to have a copy of those artifacts automatically.
> >
> > In theory, there's nothing that prevents an entire QEMU build
> > directory, to be treated as an artifact.  In practice, there are
> > predefined limits on GitLab that prevents that from being possible,
> > resulting in errors such as:
> >
> >    Uploading artifacts...
> >    build: found 3164 matching files                   
> >    ERROR: Uploading artifacts to coordinator... too large archive
> >           id=xxxxxxx responseStatus=413 Request Entity Too Large
> >           status=413 Request Entity Too Large token=yyyyyyyyy
> >    FATAL: too large                                   
> >    ERROR: Job failed: exit code 1
> >
> > As far as I can tell, this is an instance define limit that's clearly
> > influenced by storage costs.  I see a few possible solutions to this
> > limitation:
> >
> >  1) Provide our own "artifact" like solution that uses our own storage
> >     solution
> >
> >  2) Reduce or eliminate the dependency on a complete build tree
> >
> > The first solution can go against the general trend of not having to
> > maintain CI infrastructure.  It could be made simpler by using cloud
> > storage, but there would still be some interaction with another
> > external infrastructure component.
> >
> > I find the second solution preferrable, given that most tests depend
> > on having one or a few binaries available.  I've run multi-stage
> > pipelines with some of those binaries (qemu-img,
> > $target-softmmu/qemu-system-$target) defined as artifcats and they
> > behaved as expected.  But, this could require some intrusive changes
> > to the current "make"-based test invocation.
> 
> It would be nice if the make check could be run with a make install'ed
> set of binaries. I'm not sure how much hackery would be required to get
> that to work nicely. Does specifying QEMU and QEMU_IMG prevent make
> trying to re-build everything in situ?
>

At this point, I don't know how hard it'd be.  I'll certainly give it
a try.  Thomas has provided some extra info on another response to
this thread too.

> >
> > Job Naming convention
> > ---------------------
> >
> > Based only on the very simple examples job above, it should already be
> > clear that there's a lot of possibility for confusion and chaos.  For
> > instance, by looking at the "build" job definition or results, it's
> > very hard to tell what's really about.  A bit more could be inferred by
> > the "x86_64-block" job name.
> >
> > Still, the problem we have to address here is not only about the
> > amount of information easily obtained from a job name, but allowing
> > for very similar job definitions within a global namespace.  For
> > instance, if we add an Operating Systems component to the mix, we need
> > an extra qualifier for unique job names.
> >
> > Some of the possible components in a job definition are:
> >
> >  * Stage
> >  * Build profile
> >  * Test set (a shorter name for what was described in the "Common set
> >    of tests" section)
> >  * Host architecture
> >  * Target architecture
> >  * Host Operating System identification (name and version)
> >  * Execution mode/environment (bare metal, container, VM, etc)
> >
> > Stage
> > ~~~~~
> >
> > The stage of a job (which maps roughly to its purpose) should be
> > clearly defined.  A job that builds QEMU should start with "build" and
> > a job that tests QEMU should start with "test".
> >
> > IMO, in a second phase, once multi-stage pipelines are taken for
> > granted, we could evaluate dropping this component altogether from the
> > naming convention, and relying purely on the stage classification.
> >
> > Build profile
> > ~~~~~~~~~~~~~
> >
> > Different build profiles already abound in QEMU's various CI
> > configuration files.  It's hard to put a naming convention here,
> > except that it should represent the most distinguishable
> > characteristics of the build configuration.  For instance, we can find
> > a "build-disabled" job in the current ".gitlab-ci.yml" file that is
> > aptly named, as it forcefully disables a lot of build options.
> >
> > Test set
> > ~~~~~~~~
> >
> > As mentioned in the "Common set of tests" section, I believe that the
> > make target name can be used to identify the test set that will be
> > executed in a job.  That is, if a job is to be run at the "test"
> > stage, and will run "make check", its name should start with
> > "test-check".
> >
> > QEMU Targets
> > ~~~~~~~~~~~~
> >
> > Because a given job could, and usually do, involve multiple targets, I
> > honestly can not think of how to add this to the naming convention.
> > I'll ignore it for now, and consider the targets are defined in the
> > build profile.
> 
> I like to think of three groups:
> 
>   Core SoftMMU - the major KVM architectures
>   The rest of SoftMMU - all our random emulation targets
>   linux-user
>

OK, makes sense.  It'd be nice to know if other share the same general
idea.  I'll check how pervasive this general definition is on the
documentation too.

> >
> > Host Architecture
> > ~~~~~~~~~~~~~~~~~
> >
> > The host architecture name convention should be an easy pick, given
> > that QEMU itself employes a architecture convention for its targets.
> >
> > Host OS
> > ~~~~~~~
> >
> > The suggestion I have for the host OS name is to follow the
> > libosinfo[10] convention as closely as possible.  libosinfo's "Short
> > ID" should be well suitable here.  Examples include: "openbsd4.2",
> > "opensuse42.3", "rhel8.0", "ubuntu9.10" and "win2k12r2".
> >
> > Execution Environment
> > ~~~~~~~~~~~~~~~~~~~~~
> >
> > Distinguishing between running tests in a bare-metal versus a nested
> > VM environment is quite significant to a number of people.
> >
> > Still, I think it could probably be optional for the initial
> > implementation phase, like the naming convention for the QEMU Targets.
> >
> > Example 1
> > ~~~~~~~~~
> >
> > Defining a job that will build QEMU with common debug options, on
> > a RHEL 8.0 system on a x86_64 host:
> >
> >    build-debug-rhel8.0-x86_64:
> >      script:
> >        - ./configure --enable-debug
> >        - make
> >
> > Example 2
> > ~~~~~~~~~
> >
> > Defining a job that will run the "qtest" test set on a NetBSD 8.1
> > system on an aarch64 host:
> >
> >    test-qtest-netbsd8.1-aarch64:
> >      script:
> >        - make check-qtest
> >
> > Job and Machine Scheduling
> > --------------------------
> >
> > While the naming convention gives some information to human beings,
> > and hopefully allows for some order and avoids collusions on the
> > global job namespace, it's not enough to define where those jobs
> > should run.
> >
> > Tags[11] is the available mechanism to tie jobs to specific machines
> > running the GitLab CI agent, "gitlab-runner".  Unfortunately, some
> > duplication seems unavoidable, in the sense that some of the naming
> > components listed above are machine specific, and will then need to be
> > also given as tags.
> >
> > Note: it may be a good idea to be extra verbose with tags, by having a
> > qualifier prefix.  The justification is that tags also live in a
> > global namespace, and in theory, at a given point, tags of different
> > "categories", say a CPU name and Operating System name may collide.
> > Or, it may just be me being paranoid.
> >
> > Example 1
> > ~~~~~~~~~
> >
> >    build-debug-rhel8.0-x86_64:
> >      tags:
> >        - rhel8.0
> >        - x86_64
> >      script:
> >        - ./configure --enable-debug
> >        - make
> >
> > Example 2
> > ~~~~~~~~~
> >
> >    test-qtest-netbsd8.1-aarch64:
> >      tags:
> >        - netbsd8.1
> >        - aarch64
> >      script:
> >        - make check-qtest
> 
> Where are all these going to go? Are we overloading the existing
> gitlab.yml or are we going to have a new set of configs for the GatingCI
> and keep gitlab.yml as the current subset that people run on their own
> accounts?
>

These will have to go into the existing ".gitlab-ci.yml" file, because
current GitLab has no support for multiple pipelines, and no support
for multiple "gitlab.yml" files.  That's one of the reasons why I took
the time to describe the proposal, because normalizing the current
file to receive extra jobs is, if not necessary, highly desirable.

Thanks for the feedback and insights,
- Cleber.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]