qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] QEMU Gating CI


From: Alex Bennée
Subject: Re: [RFC] QEMU Gating CI
Date: Tue, 03 Dec 2019 14:07:32 +0000
User-agent: mu4e 1.3.5; emacs 27.0.50

Cleber Rosa <address@hidden> writes:

> RFC: QEMU Gating CI
> ===================
>
> This RFC attempts to address most of the issues described in
> "Requirements/GatinCI"[1].  An also relevant write up is the "State of
> QEMU CI as we enter 4.0"[2].
>
> The general approach is one to minimize the infrastructure maintenance
> and development burden, leveraging as much as possible "other people's"
> infrastructure and code.  GitLab's CI/CD platform is the most relevant
> component dealt with here.
>
> Problem Statement
> -----------------
>
> The following is copied verbatim from Peter Maydell's write up[1]:
>
> "A gating CI is a prerequisite to having a multi-maintainer model of
> merging. By having a common set of tests that are run prior to a merge
> you do not rely on who is currently doing merging duties having access
> to the current set of test machines."
>
> This is of a very simplified view of the problem that I'd like to break
> down even further into the following key points:
>
>  * Common set of tests
>  * Pre-merge ("prior to a merge")
>  * Access to the current set of test machines
>  * Multi-maintainer model
>
> Common set of tests
> ~~~~~~~~~~~~~~~~~~~
>
> Before we delve any further, let's make it clear that a "common set of
> tests" is really a "dynamic common set of tests".  My point is that a
> set of tests in QEMU may include or exclude different tests depending
> on the environment.
>
> The exact tests that will be executed may differ depending on the
> environment, including:
>
>  * Hardware
>  * Operating system
>  * Build configuration
>  * Environment variables
>
> In the "State of QEMU CI as we enter 4.0" Alex Bennée listed some of
> those "common set of tests":
>
>  * check

Check encompasses a subset of the other checks - currently:

 - check-unit
 - check-qtest
 - check-block

The thing that stops other groups of tests being included is generally
are they solid on all the various hw/os/config/env setups you describe.
For example check-tcg currently fails gloriously on non-x86 with docker
enabled as it tries to get all the cross compiler images working.

>  * check-tcg
>  * check-softfloat
>  * check-block
>  * check-acceptance
>
> While Peter mentions that most of his checks are limited to:
>
>  * check
>  * check-tcg
>
> Our current inability to quickly identify a faulty test from test
> execution results (and specially in remote environments), and act upon
> it (say quickly disable it on a given host platform), makes me believe
> that it's fair to start a gating CI implementation that uses this
> rather coarse granularity.
>
> Another benefit is a close or even a 1:1 relationship between a common
> test set and an entry in the CI configuration.  For instance, the
> "check" common test set would map to a "make check" command in a
> "script:" YAML entry.
>
> To exemplify my point, if one specific test run as part of "check-tcg"
> is found to be faulty on a specific job (say on a specific OS), the
> entire "check-tcg" test set may be disabled as a CI-level maintenance
> action.

This would in this example eliminate practically all emulation testing
apart from the very minimal boot-codes that get spun up by the various
qtest migration tests. And of course the longer a group of tests is
disabled the larger the window for additional regressions to get in.

It may be a reasonable approach but it's not without consequence.

> Of course a follow up action to deal with the specific test
> is required, probably in the form of a Launchpad bug and patches
> dealing with the issue, but without necessarily a CI related angle to
> it.
>
> If/when test result presentation and control mechanism evolve, we may
> feel confident and go into finer grained granularity.  For instance, a
> mechanism for disabling nothing but "tests/migration-test" on a given
> environment would be possible and desirable from a CI management
> level.

The migration tests have found regressions although the problem has
generally been they were intermittent failures and hard to reproduce
locally. The last one took a few weeks of grinding to reproduce and get
patches together.

> Pre-merge
> ~~~~~~~~~
>
> The natural way to have pre-merge CI jobs in GitLab is to send "Merge
> Requests"[3] (abbreviated as "MR" from now on).  In most projects, a
> MR comes from individual contributors, usually the authors of the
> changes themselves.  It's my understanding that the current maintainer
> model employed in QEMU will *not* change at this time, meaning that
> code contributions and reviews will continue to happen on the mailing
> list.  A maintainer then, having collected a number of patches, would
> submit a MR either in addition or in substitution to the Pull Requests
> sent to the mailing list.
>
> "Pipelines for Merged Results"[4] is a very important feature to
> support the multi-maintainer model, and looks in practice, similar to
> Peter's "staging" branch approach, with an "automatic refresh" of the
> target branch.  It can give a maintainer extra confidence that a MR
> will play nicely with the updated status of the target branch.  It's
> my understanding that it should be the "key to the gates".  A minor
> note is that conflicts are still possible in a multi-maintainer model
> if there are more than one person doing the merges.
>
> A worthy point is that the GitLab web UI is not the only way to create
> a Merge Request, but a rich set of APIs are available[5].  This is
> interesting for many reasons, and maybe some of Peter's
> "apply-pullreq"[6] actions (such as bad UTF8 or bogus qemu-devel email
> addresses checks could be made earlier) as part of a
> "send-mergereq"-like script, bringing conformance earlier on the merge
> process, at the MR creation stage.
>
> Note: It's possible to have CI jobs definition that are specific to
> MR, allowing generic non-MR jobs to be kept on the default
> configuration.  This can be used so individual contributors continue
> to leverage some of the "free" (shared) runner made available on
> gitlab.com.
>
> Multi-maintainer model
> ~~~~~~~~~~~~~~~~~~~~~~
>
> The previous section already introduced some of the proposed workflow
> that can enable such a multi-maintainer model.  With a Gating CI
> system, though, it will be natural to have a smaller "Mean time
> between (CI) failures", simply because of the expected increased
> number of systems and checks.  A lot of countermeasures have to be
> employed to keep that MTBF in check.
>
> For once, it's imperative that the maintainers for such systems and
> jobs are clearly defined and readily accessible.  Either the same
> MAINTAINERS file or a more suitable variation of such data should be
> defined before activating the *gating* rules.  This would allow a
> routing to request the attention of the maintainer responsible.
>
> In case of unresposive maintainers, or any other condition that
> renders and keeps one or more CI jobs failing for a given previously
> established amount of time, the job can be demoted with an
> "allow_failure" configuration[7].  Once such a change is commited, the
> path to promotion would be just the same as in a newly added job
> definition.
>
> Note: In a future phase we can evaluate the creation of rules that
> look at changed paths in a MR (similar to "F:" entries on MAINTAINERS)
> and the execution of specific CI jobs, which would be the
> responsibility of a given maintainer[8].
>
> Access to the current set of test machines
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> When compared to the various CI systems and services already being
> employed in the QEMU project, this is the most striking difference in
> the proposed model.  Instead of relying on shared/public/free
> resources, this proposal also deals with privately owned and
> operated machines.
>
> Even though the QEMU project operates with great cooperation, it's
> crucial to define clear boundaries when it comes to machine access.
> Restricted access to machines is important because:
>
>  * The results of jobs are many times directly tied to the setup and
>    status of machines.  Even "soft" changes such as removing or updating
>    packages can introduce failures in jobs (this is greatly minimized
>    but not completely eliminated when using containers or VMs).
>    Updating firmware or changing its settings are also examples of
>    changes that may change the outcome of jobs.
>
>  * If maintainers will be accounted for the status of the jobs defined
>    to run on specific machines, they must be sure of the status of the
>    machines.
>
>  * Machines need regular monitoring and will receive required
>    maintainance actions which can cause job regressions.
>
> Thus, there needs to be one clear way for machines to be *used* for
> running jobs sent by different maintainers, while still prohibiting
> any other privileged action that can cause permanent change to the
> machine.  The GitLab agent (gitlab-runner) is designed to do just
> that, and defining what will be excuted in a job (in a given system)
> should be all that's generally allowed.  The job definition itself,
> will of course be subject to code review before a maintainer decides
> to send a MR containing such new or updated job definitions.
>
> Still related to machine maintanance, it's highly desirable for jobs
> tied to specific host machines to be introduced alongside with
> documentation and/or scripts that can replicate the machine setup.  If
> the machine setup steps can be easily and reliably reproduced, then:
>
>  * Other people may help to debug issues and regressions if they
>    happen to have the same hardware available
>
>  * Other people may provide more machines to run the same types of
>    jobs
>
>  * If a machine maintainer goes MIA, it'd be easier to find another
>    maintainer
>
> GitLab Jobs and Pipelines
> -------------------------
>
> GitLab CI is built around two major concepts: jobs and pipelines.  The
> current GitLab CI configuration in QEMU uses jobs only (or putting it
> another way, all jobs in a single pipeline stage).  Consider the
> folowing job definition[9]:
>
>    build-tci:
>     script:
>     - TARGETS="aarch64 alpha arm hppa m68k microblaze moxie ppc64 s390x 
> x86_64"
>     - ./configure --enable-tcg-interpreter
>          --target-list="$(for tg in $TARGETS; do echo -n ${tg}'-softmmu '; 
> done)"
>     - make -j2
>     - make tests/boot-serial-test tests/cdrom-test tests/pxe-test
>     - for tg in $TARGETS ; do
>         export QTEST_QEMU_BINARY="${tg}-softmmu/qemu-system-${tg}" ;
>         ./tests/boot-serial-test || exit 1 ;
>         ./tests/cdrom-test || exit 1 ;
>       done
>     - QTEST_QEMU_BINARY="x86_64-softmmu/qemu-system-x86_64" ./tests/pxe-test
>     - QTEST_QEMU_BINARY="s390x-softmmu/qemu-system-s390x" ./tests/pxe-test -m 
> slow
>
> All the lines under "script" are performed sequentially.  It should be
> clear that there's the possibility of breaking this down into multiple
> stages, so that a build happens first, and then "common set of tests"
> run in parallel.  Using the example above, it would look something
> like:
>
>    +---------------+------------------------+
>    |  BUILD STAGE  |        TEST STAGE      |
>    +---------------+------------------------+
>    |   +-------+   |  +------------------+  |
>    |   | build |   |  | boot-serial-test |  |
>    |   +-------+   |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | cdrom-test       |  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | x86_64-pxe-test  |  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | s390x-pxe-test   |  |
>    |               |  +------------------+  |
>    |               |                        |
>    +---------------+------------------------+
>
> Of course it would be silly to break down that job into smaller jobs that
> would run individual tests like "boot-serial-test" or "cdrom-test".  Still,
> the pipeline approach is valid because:
>
>  * Common set of tests would run in parallel, giving a quicker result
>    turnaround

check-unit is a good candidate for parallel tests. The others depends -
I've recently turned most make check's back to -j 1 on travis because
it's a real pain to see what test has hung when other tests keep
running.

>
>  * It's easier to determine to possible nature of the problem with
>    just the basic CI job status
>
>  * Different maintainers could be defined for different "common set of
>    tests", and again by leveraging the basic CI job status, automation
>    for directed notification can be implemented
>
> In the following example, "check-block" maintainers could be left
> undisturbed with failures in the "check-acceptance" job:
>
>    +---------------+------------------------+
>    |  BUILD STAGE  |        TEST STAGE      |
>    +---------------+------------------------+
>    |   +-------+   |  +------------------+  |
>    |   | build |   |  | check-block      |  |
>    |   +-------+   |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | check-acceptance |  |
>    |               |  +------------------+  |
>    |               |                        |
>    +---------------+------------------------+
>
> The same logic applies for test sets for different targets.  For
> instance, combining the two previous examples, there could different
> maintainers defined for the different jobs on the test stage:
>
>    +---------------+------------------------+
>    |  BUILD STAGE  |        TEST STAGE      |
>    +---------------+------------------------+
>    |   +-------+   |  +------------------+  |
>    |   | build |   |  | x86_64-block     |  |
>    |   +-------+   |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | x86_64-acceptance|  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | s390x-block      |  |
>    |               |  +------------------+  |
>    |               |                        |
>    |               |  +------------------+  |
>    |               |  | s390x-acceptance |  |
>    |               |  +------------------+  |
>    +---------------+------------------------+
>
> Current limitations for a multi-stage pipeline
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Because it's assumed that each job will happen in an isolated and
> independent execution environment, jobs must explicitly define the
> resources that will be shared between stages.  GitLab will make sure
> the same source code revision will be available on all jobs
> automatically.  Additionaly, GitLab supports the concept of artifacts.
> By defining artifacts in the "build" stage, jobs in the "test" stage
> can expect to have a copy of those artifacts automatically.
>
> In theory, there's nothing that prevents an entire QEMU build
> directory, to be treated as an artifact.  In practice, there are
> predefined limits on GitLab that prevents that from being possible,
> resulting in errors such as:
>
>    Uploading artifacts...
>    build: found 3164 matching files                   
>    ERROR: Uploading artifacts to coordinator... too large archive
>           id=xxxxxxx responseStatus=413 Request Entity Too Large
>           status=413 Request Entity Too Large token=yyyyyyyyy
>    FATAL: too large                                   
>    ERROR: Job failed: exit code 1
>
> As far as I can tell, this is an instance define limit that's clearly
> influenced by storage costs.  I see a few possible solutions to this
> limitation:
>
>  1) Provide our own "artifact" like solution that uses our own storage
>     solution
>
>  2) Reduce or eliminate the dependency on a complete build tree
>
> The first solution can go against the general trend of not having to
> maintain CI infrastructure.  It could be made simpler by using cloud
> storage, but there would still be some interaction with another
> external infrastructure component.
>
> I find the second solution preferrable, given that most tests depend
> on having one or a few binaries available.  I've run multi-stage
> pipelines with some of those binaries (qemu-img,
> $target-softmmu/qemu-system-$target) defined as artifcats and they
> behaved as expected.  But, this could require some intrusive changes
> to the current "make"-based test invocation.

It would be nice if the make check could be run with a make install'ed
set of binaries. I'm not sure how much hackery would be required to get
that to work nicely. Does specifying QEMU and QEMU_IMG prevent make
trying to re-build everything in situ?

>
> Job Naming convention
> ---------------------
>
> Based only on the very simple examples job above, it should already be
> clear that there's a lot of possibility for confusion and chaos.  For
> instance, by looking at the "build" job definition or results, it's
> very hard to tell what's really about.  A bit more could be inferred by
> the "x86_64-block" job name.
>
> Still, the problem we have to address here is not only about the
> amount of information easily obtained from a job name, but allowing
> for very similar job definitions within a global namespace.  For
> instance, if we add an Operating Systems component to the mix, we need
> an extra qualifier for unique job names.
>
> Some of the possible components in a job definition are:
>
>  * Stage
>  * Build profile
>  * Test set (a shorter name for what was described in the "Common set
>    of tests" section)
>  * Host architecture
>  * Target architecture
>  * Host Operating System identification (name and version)
>  * Execution mode/environment (bare metal, container, VM, etc)
>
> Stage
> ~~~~~
>
> The stage of a job (which maps roughly to its purpose) should be
> clearly defined.  A job that builds QEMU should start with "build" and
> a job that tests QEMU should start with "test".
>
> IMO, in a second phase, once multi-stage pipelines are taken for
> granted, we could evaluate dropping this component altogether from the
> naming convention, and relying purely on the stage classification.
>
> Build profile
> ~~~~~~~~~~~~~
>
> Different build profiles already abound in QEMU's various CI
> configuration files.  It's hard to put a naming convention here,
> except that it should represent the most distinguishable
> characteristics of the build configuration.  For instance, we can find
> a "build-disabled" job in the current ".gitlab-ci.yml" file that is
> aptly named, as it forcefully disables a lot of build options.
>
> Test set
> ~~~~~~~~
>
> As mentioned in the "Common set of tests" section, I believe that the
> make target name can be used to identify the test set that will be
> executed in a job.  That is, if a job is to be run at the "test"
> stage, and will run "make check", its name should start with
> "test-check".
>
> QEMU Targets
> ~~~~~~~~~~~~
>
> Because a given job could, and usually do, involve multiple targets, I
> honestly can not think of how to add this to the naming convention.
> I'll ignore it for now, and consider the targets are defined in the
> build profile.

I like to think of three groups:

  Core SoftMMU - the major KVM architectures
  The rest of SoftMMU - all our random emulation targets
  linux-user

>
> Host Architecture
> ~~~~~~~~~~~~~~~~~
>
> The host architecture name convention should be an easy pick, given
> that QEMU itself employes a architecture convention for its targets.
>
> Host OS
> ~~~~~~~
>
> The suggestion I have for the host OS name is to follow the
> libosinfo[10] convention as closely as possible.  libosinfo's "Short
> ID" should be well suitable here.  Examples include: "openbsd4.2",
> "opensuse42.3", "rhel8.0", "ubuntu9.10" and "win2k12r2".
>
> Execution Environment
> ~~~~~~~~~~~~~~~~~~~~~
>
> Distinguishing between running tests in a bare-metal versus a nested
> VM environment is quite significant to a number of people.
>
> Still, I think it could probably be optional for the initial
> implementation phase, like the naming convention for the QEMU Targets.
>
> Example 1
> ~~~~~~~~~
>
> Defining a job that will build QEMU with common debug options, on
> a RHEL 8.0 system on a x86_64 host:
>
>    build-debug-rhel8.0-x86_64:
>      script:
>        - ./configure --enable-debug
>        - make
>
> Example 2
> ~~~~~~~~~
>
> Defining a job that will run the "qtest" test set on a NetBSD 8.1
> system on an aarch64 host:
>
>    test-qtest-netbsd8.1-aarch64:
>      script:
>        - make check-qtest
>
> Job and Machine Scheduling
> --------------------------
>
> While the naming convention gives some information to human beings,
> and hopefully allows for some order and avoids collusions on the
> global job namespace, it's not enough to define where those jobs
> should run.
>
> Tags[11] is the available mechanism to tie jobs to specific machines
> running the GitLab CI agent, "gitlab-runner".  Unfortunately, some
> duplication seems unavoidable, in the sense that some of the naming
> components listed above are machine specific, and will then need to be
> also given as tags.
>
> Note: it may be a good idea to be extra verbose with tags, by having a
> qualifier prefix.  The justification is that tags also live in a
> global namespace, and in theory, at a given point, tags of different
> "categories", say a CPU name and Operating System name may collide.
> Or, it may just be me being paranoid.
>
> Example 1
> ~~~~~~~~~
>
>    build-debug-rhel8.0-x86_64:
>      tags:
>        - rhel8.0
>        - x86_64
>      script:
>        - ./configure --enable-debug
>        - make
>
> Example 2
> ~~~~~~~~~
>
>    test-qtest-netbsd8.1-aarch64:
>      tags:
>        - netbsd8.1
>        - aarch64
>      script:
>        - make check-qtest

Where are all these going to go? Are we overloading the existing
gitlab.yml or are we going to have a new set of configs for the GatingCI
and keep gitlab.yml as the current subset that people run on their own
accounts?

>
> Operating System definition versus Container Images
> ---------------------------------------------------
>
> In the previous section and examples, we're assuming that tests will
> run on machines that have registered "gitlab-runner" agents with
> matching tags.  The tags given at gitlab-runner registration time
> would of course match the same naming convention defined earlier.
>
> So, if one is registering a "gitlab-runner" instance on a x86_64
> machine, running RHEL 8.0, the tags "rhel8.0" and "x86_64" would be
> given (possibly among others).
>
> Nevertheless, most deployment scenarios will probably rely on jobs
> being executed by gitlab-runner's container executor (currently
> Docker-only).  This means that tags given to a job *may* drop the tag
> associated with the host operating system selection, and instead
> provide the ".gitlab-ci.yml" configuration directive that determines
> the container image to be used.
>
> Most jobs would probably *not* require a matching host operating
> system and container images, but there should still be the capability
> to make it a requirement.  For instance, jobs containing tests that
> require the KVM accelerator on specific scenarios may require a
> matching host Operating System.
>
> Note: What was mentioned in the "Execution Environment" section under
> the naming conventions section, is also closely related to this
> requirement, that is, one may require a job to run under a container,
> VM or bare metal.
>
> Example 1
> ~~~~~~~~~
>
> Build QEMU on a "rhel8.0" image hosted under the "qemuci" organization
> and require the runner to support container execution:
>
>    build-debug-rhel8.0-x86_64:
>      tags:
>        - x86_64
>        - container
>      image: qemuci/rhel8.0
>      script:
>        - ./configure --enable-debug
>        - make
>
> Example 2
> ~~~~~~~~~
>
> Run check QEMU on a "rhel8.0" image hosted under the "qemuci"
> organization and require the runner to support container execution and
> be on a matching host:
>
>    test-check-rhel8.0-x86_64:
>      tags:
>        - x86_64
>        - rhel8.0
>        - container
>      image: qemuci/rhel8.0
>      script:
>        - make check
>
> Next
> ----
>
> Because this document is already too long and that can be distracting,
> I decided to defer many other implementation level details to a second
> RFC, alongside some code.
>
> Some completementary topics that I have prepared include:
>
>  * Container images creation, hosting and management
>  * Advanced pipeline definitions
>    - Job depedencies
>    - Artifacts
>    - Results
>  * GitLab CI for Individial Contributors
>  * GitLab runner:
>    - Official and Custom Binaries
>    - Executors
>    - Security implications
>    - Helper container images for non supported architectures
>  * Checklists for:
>    - Preparing and documenting machine setup
>    - Proposing new runners and jobs
>    - Runners and jobs promotions and demotions
>
> Of course any other topics that spurr from this discussion will also
> be added to the following threads.
>
> References:
> -----------
>  [1] https://wiki.qemu.org/Requirements/GatingCI
>  [2] https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg04909.html
>  [3] https://docs.gitlab.com/ee/gitlab-basics/add-merge-request.html
>  [4] 
> https://docs.gitlab.com/ee/ci/merge_request_pipelines/pipelines_for_merged_results/index.html
>  [5] https://docs.gitlab.com/ee/api/merge_requests.html#create-mr-pipeline
>  [6] 
> https://git.linaro.org/people/peter.maydell/misc-scripts.git/tree/apply-pullreq
>  [7] https://docs.gitlab.com/ee/ci/yaml/README.html#allow_failure
>  [8] 
> https://docs.gitlab.com/ee/ci/yaml/README.html#using-onlychanges-with-pipelines-for-merge-requests
>  [9] 
> https://github.com/qemu/qemu/blob/fb2246882a2c8d7f084ebe0617e97ac78467d156/.gitlab-ci.yml#L70
>  
>  [10] https://libosinfo.org/
>  [11] https://docs.gitlab.com/ee/ci/runners/README.html#using-tags


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]