[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] dataplane performance on s390
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] dataplane performance on s390 |
Date: |
Thu, 19 Jun 2014 18:39:28 +0800 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Tue, Jun 10, 2014 at 09:40:38AM +0800, Fam Zheng wrote:
> On Mon, 06/09 15:43, Karl Rister wrote:
> > Hi All
> >
> > I was asked by our development team to do a performance sniff test of the
> > latest dataplane code on s390 and compare it against qemu.git. Here is a
> > brief description of the configuration, the testing done, and then the
> > results.
> >
> > Configuration:
> >
> > Host: 26 CPU LPAR, 64GB, 8 zFCP adapters
> > Guest: 4 VCPU, 1GB, 128 virtio block devices
> >
> > Each virtio block device maps to a dm-multipath device in the host with 8
> > paths. Multipath is configured with the service-time policy. All block
> > devices are configured to use the deadline IO scheduler.
> >
> > Test:
> >
> > FIO is used to run 4 scenarios: sequential read, sequential write, random
> > read, and random write. Sequential scenarios use a 128KB request size and
> > random scenarios us a 8KB request size. Each scenario is run with an
> > increasing number of jobs, from 1 to 128 (powers of 2). Each job is bound
> > to an individual file on an ext3 file system on a virtio device and uses
> > O_DIRECT, libaio, and iodepth=1. Each test is run three times for 2 minutes
> > each, the first iteration (a warmup) is thrown out and the next two
> > iterations are averaged together.
> >
> > Results:
> >
> > Baseline: qemu.git 93f94f9018229f146ed6bbe9e5ff72d67e4bd7ab
> >
> > Dataplane: bdrv_set_aio_context 0ab50cde71aa27f39b8a3ea4766ff82671adb2a4
>
> Hi Karl,
>
> Thanks for the results.
>
> The throughput differences look minimal, where is the bandwidth saturated in
> these tests? And why use iodepth=1, not more?
>
> Thanks,
> Fam
>
> >
> > Sequential Read:
> >
> > Overall a slight throughput regression with a noticeable reduction in CPU
> > efficiency.
> >
> > 1 Job: Throughput regressed -1.4%, CPU improved -0.83%.
> > 2 Job: Throughput regressed -2.5%, CPU regressed +2.81%
> > 4 Job: Throughput regressed -2.2%, CPU regressed +12.22%
> > 8 Job: Throughput regressed -0.7%, CPU regressed +9.77%
> > 16 Job: Throughput regressed -3.4%, CPU regressed +7.04%
> > 32 Job: Throughput regressed -1.8%, CPU regressed +12.03%
> > 64 Job: Throughput regressed -0.1%, CPU regressed +10.60%
> > 128 Job: Throughput increased +0.3%, CPU regressed +10.70%
> >
> > Sequential Write:
> >
> > Mostly regressed throughput, although it gets better as job count increases
> > and even has some gains at higher job counts. CPU efficiency is regressed.
> >
> > 1 Job: Throughput regressed -1.9%, CPU regressed +0.90%
> > 2 Job: Throughput regressed -2.0%, CPU regressed +1.07%
> > 4 Job: Throughput regressed -2.4%, CPU regressed +8.68%
> > 8 Job: Throughput regressed -2.0%, CPU regressed +4.23%
> > 16 Job: Throughput regressed -5.0%, CPU regressed +10.53%
> > 32 Job: Throughput improved +7.6%, CPU regressed +7.37%
> > 64 Job: Throughput regressed -0.6%, CPU regressed +7.29%
> > 128 Job: Throughput improved +8.3%, CPU regressed +6.68%
> >
> > Random Read:
> >
> > Again, mostly throughput regressions except for the largest job counts. CPU
> > efficiency is regressed at all data points.
> >
> > 1 Job: Throughput regressed -3.0%, CPU regressed +0.14%
> > 2 Job: Throughput regressed -3.6%, CPU regressed +6.86%
> > 4 Job: Throughput regressed -5.1%, CPU regressed +11.11%
> > 8 Job: Throughput regressed -8.6%, CPU regressed +12.32%
> > 16 Job: Throughput regressed -5.7%, CPU regressed +12.99%
> > 32 Job: Throughput regressed -7.4%, CPU regressed +7.62%
> > 64 Job: Throughput improved +10.0%, CPU regressed +10.83%
> > 128 Job: Throughput improved +10.7%, CPU regressed +10.85%
> >
> > Random Write:
> >
> > Throughput and CPU regressed at all but one data point.
> >
> > 1 Job: Throughput regressed -2.3%, CPU improved -1.50%
> > 2 Job: Throughput regressed -2.2%, CPU regressed +0.16%
> > 4 Job: Throughput regressed -1.0%, CPU regressed +8.36%
> > 8 Job: Throughput regressed -8.6%, CPU regressed +12.47%
> > 16 Job: Throughput regressed -3.1%, CPU regressed +12.40%
> > 32 Job: Throughput regressed -0.2%, CPU regressed +11.59%
> > 64 Job: Throughput regressed -1.9%, CPU regressed +12.65%
> > 128 Job: Throughput improved +5.6%, CPU regressed +11.68%
> >
> >
> > * CPU consumption is an efficiency calculation of usage per MB of
> > throughput.
Thanks for sharing! This is actually not too bad considering that the
bdrv_set_aio_context() code uses the QEMU block layer while the older
qemu.git code uses a custom Linux AIO code path.
The CPU efficiency regression is interesting. Do you have any profiling
data that shows where the hot spots are?
Thanks,
Stefan
pgpXvuuKHMplU.pgp
Description: PGP signature