qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH experiment 00/16] C++20 coroutine backend


From: Paolo Bonzini
Subject: [PATCH experiment 00/16] C++20 coroutine backend
Date: Mon, 14 Mar 2022 10:31:47 +0100

It turns out that going from a prototype C++ implementation of the QEMU
API, to something that could build tests/unit/test-coroutine, was just a
few hours work; and once it compiled, only one line had to be changed
for every test to pass.

Most of the differences between C and C++ already show up here:

- keywords such as "new" (or "class", which I didn't encounter yet)

- _Generic must be replaced by templates and/or overloading (QemuCoLockable
is implemented completely different from QemuLockable, in fact I spent
most of the time on that)

- PRI* functions must be separated with a space from string constants that
precede it

- void* casts must be explicit (g_new takes care of that most of the time,
but not for opaque pointers passed to coroutine).

There are 300 lines of hard-core C++ in the backend and in
coroutine.h.  I tried to comment it as much as possible (this
time I didn't include a big commit message on stackless coroutines
in general) but it still requires some knowledge of the basic C++
coroutine concepts of resumable types, promise types and awaiter types.
https://www.youtube.com/watch?v=ZTqHjjm86Bw is an excellent introduction
and it's where I learnt most of what was needed.

However, there  are no ramifications to actual coroutine code, except
for the template syntax "CoroutineFn<return_type>" for the function and
the mandatory co_await/co_return keywords... both of which are an
improvement, really: the fact that a single function cannot run either
inside or outside coroutines is checked by the compiler now, because
qemu_coroutine_create accepts a function that returns CoroutineFn<void>.
Therefore I had to disable some more code in util/ and qapi/ that used
qemu_in_coroutine() or coroutine_fn.

Here is the performance comparison of the three backends:

                   ucontext           stackless C       stackless C++
/perf/lifecycle    0.068 s            0.025 s           0.065 s
/perf/nesting      55 s               4.7 s             1.7 s
/perf/yield        6.0 s              1.3 s             1.3 s
/perf/cost         8 Mops/s (125ns)   35 ns             10000 Mops/s (99 ns)

One important difference is that C++ coroutines allocate frames on the
heap, and that explains why performance is better in /perf/nesting,
which has to do many large memory allocations for the stack in the other
two backends (and also a makecontext/swapcontext in the ucontext case).
C++ coroutines hardly benefit from the coroutine pool; OTOH that also
means the coroutine pool could be removed if we went this way.

I haven't checked why /perf/lifecycle (and therefore /perf/cost; they
are roughly the same test) is so much slower than the handwritten C code.
It's still comparable with the ucontext backend though.

Overall this was ~twice the amount of work of the C experiment, but
that's because the two are very different ways to achieve the same goal:

- the design work was substantially smaller in the C experiment, where
all the backend does is allocate stack frames and do a loop that invokes
a function pointer.  Here the backend has to map between the C++ concepts
and the QEMU API.  In the C case, most of the work was really in the
manual conversion which I had to do one function at a time.

- the remaining work is also completely different: a source-to-source
translator (and only build system work in QEMU) for the C experiment;
making ~100 files compile in C++ for this one (and relatively little
work as far as coroutines are concerned).

This was compiled with GCC 11 only.  Coroutine support was added in
GCC 10, released in 2020, which IIRC is much newer than the most recent
release we support.

Paolo

Paolo Bonzini (17):
  coroutine: add missing coroutine_fn annotations for CoRwlock functions
  coroutine: qemu_coroutine_get_aio_context is not a coroutine_fn
  coroutine: small code cleanup in qemu_co_rwlock_wrlock
  coroutine: introduce QemuCoLockable
  port atomic.h to C++
  use g_new0 instead of g_malloc0
  start porting compiler.h to C++
  tracetool: add extern "C" around generated headers
  start adding extern "C" markers
  add space between liter and string macro
  bump to C++20
  remove "new" keyword from trace-events
  disable some code
  util: introduce C++ stackless coroutine backend
  port QemuCoLockable to C++ coroutines
  port test-coroutine to C++ coroutines

 configure                                     |  48 +-
 include/block/aio.h                           |   5 +
 include/fpu/softfloat-types.h                 |   4 +
 include/qemu/atomic.h                         |   5 +
 include/qemu/bitops.h                         |   3 +
 include/qemu/bswap.h                          |  10 +-
 include/qemu/co-lockable.h                    |  93 ++++
 include/qemu/compiler.h                       |   4 +
 include/qemu/coroutine.h                      | 466 +++++++++++++-----
 include/qemu/coroutine_int.h                  |   8 +
 include/qemu/host-utils.h                     |   4 +
 include/qemu/lockable.h                       |  13 +-
 include/qemu/notify.h                         |   4 +
 include/qemu/osdep.h                          |   1 +
 include/qemu/qsp.h                            |   4 +
 include/qemu/thread.h                         |   4 +
 include/qemu/timer.h                          |   6 +-
 include/qemu/typedefs.h                       |   1 +
 meson.build                                   |   2 +-
 qapi/qmp-dispatch.c                           |   2 +
 scripts/tracetool/format/h.py                 |   8 +-
 tests/unit/meson.build                        |   8 +-
 .../{test-coroutine.c => test-coroutine.cc}   | 138 +++---
 util/async.c                                  |   2 +
 util/coroutine-stackless.cc                   | 145 ++++++
 util/meson.build                              |  14 +-
 ...oroutine-lock.c => qemu-coroutine-lock.cc} |  78 +--
 ...outine-sleep.c => qemu-coroutine-sleep.cc} |  10 +-
 util/{qemu-coroutine.c => qemu-coroutine.cc}  |  18 +-
 util/thread-pool.c                            |   2 +
 util/trace-events                             |  40 +-
 31 files changed, 805 insertions(+), 345 deletions(-)
 create mode 100644 include/qemu/co-lockable.h
 rename tests/unit/{test-coroutine.c => test-coroutine.cc} (81%)
 create mode 100644 util/coroutine-stackless.cc
 rename util/{qemu-coroutine-lock.c => qemu-coroutine-lock.cc} (86%)
 rename util/{qemu-coroutine-sleep.c => qemu-coroutine-sleep.cc} (89%)
 rename util/{qemu-coroutine.c => qemu-coroutine.cc} (93%)

-- 
2.35.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]