qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: QEMU 5.0 virtio-blk performance regression with high queue depths


From: Denis V. Lunev
Subject: Re: QEMU 5.0 virtio-blk performance regression with high queue depths
Date: Wed, 16 Sep 2020 19:43:30 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 9/16/20 5:07 PM, Denis V. Lunev wrote:
> On 9/16/20 4:32 PM, Stefan Hajnoczi wrote:
>> On Thu, Aug 27, 2020 at 3:24 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>> Hi Denis,
>>> A performance regression was found after the virtio-blk queue-size
>>> property was increased from 128 to 256 in QEMU 5.0 in commit
>>> c9b7d9ec21dfca716f0bb3b68dee75660d86629c ("virtio: increase virtqueue
>>> size for virtio-scsi and virtio-blk"). I wanted to let you know if case
>>> you have ideas or see something similar.
>> Ping, have you noticed performance regressions after switching to
>> virtio-blk queue-size 256?
> oops, I have missed original letter.
>
> Denis Plotnikov have left the team at the moment.
>
>
>>> Throughput and IOPS of the following fio benchmarks dropped by 30-40%:
>>>
>>>   # mkfs.xfs /dev/vdb
>>>   # mount /dev/vdb /mnt
>>>   # fio --rw=%s --bs=%s --iodepth=64 --runtime=1m --direct=1 
>>> --filename=/mnt/%s --name=job1 --ioengine=libaio --thread --group_reporting 
>>> --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result &> /dev/null
>>>     - rw: read write
>>>     - bs: 4k 64k
>>>
>>> Note that there are 16 threads submitting 64 requests each! The guest
>>> block device queue depth will be maxed out. The virtqueue should be full
>>> most of the time.
>>>
>>> Have you seen regressions after virtio-blk queue-size was increased in
>>> QEMU 5.0?
>>>
>>> Here are the details of the host storage:
>>>
>>>   # mkfs.xfs /dev/sdb # 60GB SSD drive
>>>   # mount /dev/sdb /mnt/test
>>>   # qemu-img create -f qcow2 /mnt/test/storage2.qcow2 40G
>>>
>>> The guest command-line is:
>>>
>>>   # MALLOC_PERTURB_=1 numactl \
>>>     -m 1  /usr/libexec/qemu-kvm \
>>>     -S  \
>>>     -name 'avocado-vt-vm1'  \
>>>     -sandbox on  \
>>>     -machine q35 \
>>>     -device 
>>> pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1
>>>  \
>>>     -device 
>>> pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
>>>     -nodefaults \
>>>     -device VGA,bus=pcie.0,addr=0x2 \
>>>     -m 4096  \
>>>     -smp 2,maxcpus=2,cores=1,threads=1,dies=1,sockets=2  \
>>>     -cpu 'IvyBridge',+kvm_pv_unhalt \
>>>     -chardev 
>>> socket,server,id=qmp_id_qmpmonitor1,nowait,path=/var/tmp/avocado_bapfdqao/monitor-qmpmonitor1-20200721-014154-5HJGMjxW
>>>   \
>>>     -mon chardev=qmp_id_qmpmonitor1,mode=control \
>>>     -chardev 
>>> socket,server,id=qmp_id_catch_monitor,nowait,path=/var/tmp/avocado_bapfdqao/monitor-catch_monitor-20200721-014154-5HJGMjxW
>>>   \
>>>     -mon chardev=qmp_id_catch_monitor,mode=control \
>>>     -device pvpanic,ioport=0x505,id=id31BN83 \
>>>     -chardev 
>>> socket,server,id=chardev_serial0,nowait,path=/var/tmp/avocado_bapfdqao/serial-serial0-20200721-014154-5HJGMjxW
>>>  \
>>>     -device isa-serial,id=serial0,chardev=chardev_serial0  \
>>>     -chardev 
>>> socket,id=seabioslog_id_20200721-014154-5HJGMjxW,path=/var/tmp/avocado_bapfdqao/seabios-20200721-014154-5HJGMjxW,server,nowait
>>>  \
>>>     -device 
>>> isa-debugcon,chardev=seabioslog_id_20200721-014154-5HJGMjxW,iobase=0x402 \
>>>     -device 
>>> pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2
>>>  \
>>>     -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
>>>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
>>>     -blockdev 
>>> node-name=file_image1,driver=file,aio=threads,filename=rootfs.qcow2,cache.direct=on,cache.no-flush=off
>>>  \
>>>     -blockdev 
>>> node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1
>>>  \
>>>     -device 
>>> pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3
>>>  \
>>>     -device 
>>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0
>>>  \
>>>     -blockdev 
>>> node-name=file_disk1,driver=file,aio=threads,filename=/mnt/test/storage2.qcow2,cache.direct=on,cache.no-flush=off
>>>  \
>>>     -blockdev 
>>> node-name=drive_disk1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_disk1
>>>  \
>>>     -device 
>>> pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4
>>>  \
>>>     -device 
>>> virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,write-cache=on,bus=pcie-root-port-3,addr=0x0
>>>  \
>>>     -device 
>>> pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5
>>>  \
>>>     -device 
>>> virtio-net-pci,mac=9a:37:37:37:37:4e,id=idBMd7vy,netdev=idLb51aS,bus=pcie-root-port-4,addr=0x0
>>>   \
>>>     -netdev tap,id=idLb51aS,fd=14  \
>>>     -vnc :0  \
>>>     -rtc base=utc,clock=host,driftfix=slew  \
>>>     -boot menu=off,order=cdn,once=c,strict=off \
>>>     -enable-kvm \
>>>     -device 
>>> pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=6
> I will make a check today.
>
> Talking about our performance measurements, we have not
> seen ANY performance degradation, especially 30-40%.
> This looking quite strange to me.
>
> Though there is quite important difference. We are always
> using O_DIRECT and 'native' AIO engine.
>
> Den

I have put my hands into this and it looks like you are right. There is
a difference. It is not as significant for me as in your case, but I observe
stable around 10% difference with 128 vs 256 queue size.

I have checked with:
- QEMU 5.1
- Fedora 31 in guest
- qcow2 (64k, 1Mb) and raw image on host
- nocache and both threaded/native IO modes

The test was run on Thinkpad Carbon X1 gen 6 laptop.

For the reference, I have seen 330k IOPS for read
at max which is looking awesome for native and 220k
IOPS for threads.

Den



reply via email to

[Prev in Thread] Current Thread [Next in Thread]