Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6

From:	Stefan Hajnoczi
Subject:	Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2
Date:	Mon, 30 Jun 2014 10:08:50 +0200
User-agent:	Mutt/1.5.23 (2014-03-12)

On Sat, Jun 28, 2014 at 05:58:58PM +0800, Ming Lei wrote:
> On Sat, Jun 28, 2014 at 5:51 AM, Paolo Bonzini <address@hidden> wrote:
> > Il 27/06/2014 20:01, Ming Lei ha scritto:
> >
> >> I just implemented plug&unplug based batching, and it is working now.
> >> But throughout still has no obvious improvement.
> >>
> >> Looks loading in IOthread is a bit low, so I am wondering if there is
> >> block point caused by Qemu QEMU block layer.
> >
> >
> > What does perf say?  Also, you can try using the QEMU trace subsystem and
> > see where the latency goes.
> 
> Follows some test result against 8589744aaf07b62 of
> upstream qemu, and the test is done on my 2core(4thread)
> laptop:
> 
> 1, with my draft batch patches[1](only linux-aio supported now)
> - throughput: +16% compared qemu upstream
> - average time spent by handle_notify(): 310us
> - average time between two handle_notify(): 1591us
> (this time reflects latency of handling host_notifier)

16% is still a worthwhile improvement.  I guess batching only benefits
aio=native since the threadpool ought to do better when it receives
requests as soon as possible.

Patch or an RFC would be welcome.

> 2, same tests on 2.0.0 release(use custom Linux AIO)
> - average time spent by handle_notify(): 68us
> - average time between calling two handle_notify(): 269us
> (this time reflects latency of handling host_notifier)
> 
> From above tests, looks root cause is late handling notify, and
> qemu block layer becomes 4times slower than previous custom
> linux aio taken by dataplane.

Try:
$ perf record -e syscalls:* --tid <iothread-tid>
^C
$ perf script # shows the trace log

The difference between syscalls in QEMU 2.0 and qemu.git/master could
reveal the problem.

Using perf you can also trace ioeventfd signalling in the host kernel
and compare against the QEMU handle_notify entry/return.  It may be
easiest to use the ftrace_marker tracing backing in QEMU so the trace is
unified with the host kernel trace (./configure
--enable-trace-backend=ftrace and see the ftrace section in QEMU
docs/tracing.txt).

This way you can see whether the ioeventfd signal -> handle_notify()
entry increased or something else is going on.

Stefan

pgpah_5weioIs.pgp
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2, (continued)
- Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2, Stefan Hajnoczi, 2014/06/27
  - Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2, Kevin Wolf, 2014/06/27
    - Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2, Stefan Hajnoczi, 2014/06/27
  - Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2, Ming Lei, 2014/06/27
    - Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2, Paolo Bonzini, 2014/06/27
    - Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2, Ming Lei, 2014/06/28
    - Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2, Stefan Hajnoczi <=
    - Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2, Ming Lei, 2014/06/30

Prev by Date: [Qemu-devel] [PATCH] ui/vnc: limit client_cut_text msg payload size
Next by Date: Re: [Qemu-devel] [PATCH] numa: check for busy memory backend
Previous by thread: Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2
Next by thread: Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2
Index(es):
- Date
- Thread