qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Performance CI - real-world example


From: Lukáš Doktor
Subject: Performance CI - real-world example
Date: Mon, 4 Oct 2021 09:41:35 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0

Hello guys,

on KVM forum I presented the intention to establish a regular performance CI 
and now I have the opportunity to demonstrate a clear hit. Long story short, 
the commit message that resulted from bisection explains the situation so no 
big surprise here and normally I would not spam you with clear issues like this 
one and simply re-train the model. Anyway let's pretend the commit message 
would not be that informative and I'd need a confirmation that the different 
performance results are expected and intentional. Some questions that come to 
my mind:

* Which form of a report would you prefer, an email or GL issue?
* Should I CC to the commit author?
* Shell I include the reviewer as well?
* What files/reports should I include?
* Is the report.html understandable?
* Should I reorganize the body of the email somehow? Cut some pieces or expand?
* Should I create issues for improvements or only regressions?
* Do you event want perf CI?
* Would you like the perf-ci status in gitlab? Example of integration (with 
always passing regular CI per each push and soft-failure-based perf-ci ran only 
on some builds: https://gitlab.com/ldoktor/qemu/-/pipelines )
* Would you like to use latest-released RHEL (at this point 8.4) or 
latest-released RHEL + latest-kernel (from fedora's koji as that's stable 
enough while close to upstream)

Example of a simple email:

---
Subject: [PERF-CI] fio-nbd write-* 1300% performance improvement
To: qemu-devel,Nir Soffer

A very significant improvement in fio-nbd write had been detected by my 
pipeline, could you please verify it is expected?

commit 09615257058a0ae87b837bb041f56f7312d9ead8
Author: Nir Soffer <nirsof@gmail.com>
Date:   Fri Aug 13 23:55:19 2021 +0300

    qemu-nbd: Change default cache mode to writeback
    
    Both qemu and qemu-img use writeback cache mode by default, which is
    already documented in qemu(1). qemu-nbd uses writethrough cache mode by
    default, and the default cache mode is not documented.
    
    According to the qemu-nbd(8):
    
       --cache=CACHE
              The  cache  mode  to be used with the file.  See the
              documentation of the emulator's -drive cache=... option for
              allowed values.
    
    qemu(1) says:
    
        The default mode is cache=writeback.
    
    So users have no reason to assume that qemu-nbd is using writethough
    cache mode. The only hint is the painfully slow writing when using the
    defaults.
    
    Looking in git history, it seems that qemu used writethrough in the past
    to support broken guests that did not flush data properly, or could not
    flush due to limitations in qemu. But qemu-nbd clients can use
    NBD_CMD_FLUSH to flush data, so using writethrough does not help anyone.
    
    Change the default cache mode to writback, and document the default and
    available values properly in the online help and manual.
    
    With this change converting image via qemu-nbd is 3.5 times faster.
    
        $ qemu-img create dst.img 50g
        $ qemu-nbd -t -f raw -k /tmp/nbd.sock dst.img
    
    Before this change:
    
        $ hyperfine -r3 "./qemu-img convert -p -f raw -O raw -T none -W 
fedora34.img nbd+unix:///?socket=/tmp/nbd.sock"
        Benchmark #1: ./qemu-img convert -p -f raw -O raw -T none -W 
fedora34.img nbd+unix:///?socket=/tmp/nbd.sock
          Time (mean ± σ):     83.639 s ±  5.970 s    [User: 2.733 s, System: 
6.112 s]
          Range (min … max):   76.749 s … 87.245 s    3 runs
    
    After this change:
    
        $ hyperfine -r3 "./qemu-img convert -p -f raw -O raw -T none -W 
fedora34.img nbd+unix:///?socket=/tmp/nbd.sock"
        Benchmark #1: ./qemu-img convert -p -f raw -O raw -T none -W 
fedora34.img nbd+unix:///?socket=/tmp/nbd.sock
          Time (mean ± σ):     23.522 s ±  0.433 s    [User: 2.083 s, System: 
5.475 s]
          Range (min … max):   23.234 s … 24.019 s    3 runs
    
    Users can avoid the issue by using --cache=writeback[1] but the defaults
    should give good performance for the common use case.
    
    [1] https://bugzilla.redhat.com/1990656
    
    Signed-off-by: Nir Soffer <nsoffer@redhat.com>
    Message-Id: <20210813205519.50518-1-nsoffer@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    CC: qemu-stable@nongnu.org
    Signed-off-by: Eric Blake <eblake@redhat.com>

 docs/tools/qemu-nbd.rst | 6 ++++--
 qemu-nbd.c              | 6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)


Bisection log:
git bisect start
# good: [6b54a31bf7b403672a798b6443b1930ae6c74dea] Merge remote-tracking branch 
'remotes/jsnow-gitlab/tags/python-pull-request' into staging
git bisect good 6b54a31bf7b403672a798b6443b1930ae6c74dea
# bad: [bb4aa8f59e18412cff0d69f14aee7abba153161a] Merge remote-tracking branch 
'remotes/pmaydell/tags/pull-target-arm-20210930' into staging
git bisect bad bb4aa8f59e18412cff0d69f14aee7abba153161a
# bad: [fce8f7735fcea23056ff41be55e73eacbca31b5e] Merge remote-tracking branch 
'remotes/dg-gitlab/tags/ppc-for-6.2-20210930' into staging
git bisect bad fce8f7735fcea23056ff41be55e73eacbca31b5e
# good: [ff8cdbbd7e4bb7a7422c080c004b899214a08b3a] MAINTAINERS: Add information 
for OpenPIC
git bisect good ff8cdbbd7e4bb7a7422c080c004b899214a08b3a
# good: [ba0fa56bc06e563de68d2a2bf3ddb0cfea1be4f9] Merge remote-tracking branch 
'remotes/vivier/tags/q800-for-6.2-pull-request' into staging
git bisect good ba0fa56bc06e563de68d2a2bf3ddb0cfea1be4f9
# bad: [0c8022876f2183f93e23a7314862140c94ee62e7] block: use int64_t instead of 
int in driver discard handlers
git bisect bad 0c8022876f2183f93e23a7314862140c94ee62e7
# skip: [e75abedab7e10043ed81b1cbc3033409aff0159e] block: use int64_t instead 
of uint64_t in driver write handlers
git bisect skip e75abedab7e10043ed81b1cbc3033409aff0159e
# bad: [485350497b7a9be3477d453fe0fdf380b8535303] block: use int64_t instead of 
uint64_t in copy_range driver handlers
git bisect bad 485350497b7a9be3477d453fe0fdf380b8535303
# bad: [558902cc3dc61231930001b82dcd95d20d58b417] qcow2: check request on 
vmstate save/load path
git bisect bad 558902cc3dc61231930001b82dcd95d20d58b417
# bad: [09615257058a0ae87b837bb041f56f7312d9ead8] qemu-nbd: Change default 
cache mode to writeback
git bisect bad 09615257058a0ae87b837bb041f56f7312d9ead8
# first bad commit: [09615257058a0ae87b837bb041f56f7312d9ead8] qemu-nbd: Change 
default cache mode to writeback


Please find the bisection report with raw numbers attached.

Regards,
Lukáš Doktor

<<< text/html; charset=UTF-8; name="report.html": Unrecognized >>>

Attachment: OpenPGP_0x26B362E47FCF22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]