qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size ex


From: Stefan Hajnoczi
Subject: Re: [Qemu-block] [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
Date: Mon, 11 Feb 2019 11:17:25 +0800
User-agent: Mutt/1.10.1 (2018-07-13)

On Wed, Feb 06, 2019 at 04:47:19PM +0000, Fernando Casas Schössow wrote:
> I could also repro the same with virtio-scsi on the same guest a couple of 
> hours later:
> 
> 2019-02-06 07:10:37.672+0000: starting up libvirt version: 4.10.0, qemu 
> version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local
> LC_ALL=C 
> PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
>  HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 
> -name guest=DOCKER01,debug-threads=on -S -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes
>  -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu 
> IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on
>  -drive 
> file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
>  -drive 
> file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 
> -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
> 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev 
> socket,id=charmonitor,fd=46,server,nowait -mon 
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
> -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global 
> PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
> ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device 
> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4
>  -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 
> -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device 
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
> file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads
>  -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on
>  -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3
>  -chardev pty,id=charserial0 -device 
> isa-serial,chardev=charserial0,id=serial0 -chardev 
> socket,id=charchannel0,fd=51,server,nowait -device 
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>  -chardev spicevmc,id=charchannel1,name=vdagent -device 
> virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
>  -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on 
> -device 
> qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
>  -chardev spicevmc,id=charredir0,name=usbredir -device 
> usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
> spicevmc,id=charredir1,name=usbredir -device 
> usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object 
> rng-random,id=objrng0,filename=/dev/random -device 
> virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox 
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg 
> timestamp=on
> 2019-02-06 07:10:37.672+0000: Domain id=32 is tainted: high-privileges
> char device redirected to /dev/pts/5 (label charserial0)
> vdev 0x5585456ef6b0 ("virtio-scsi")
> vq 0x5585456f90a0 (idx 2)
> inuse 128 vring.num 128
> 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded
> 
> 
> I'm open to any tests or suggestions that can move the investigation forward 
> and find the cause of this issue.

Thanks for collecting the data!

The fact that both virtio-blk and virtio-scsi failed suggests it's not a
virtqueue element leak in the virtio-blk or virtio-scsi device emulation
code.

The hung task error messages from inside the guest are a consequence of
QEMU hitting the "Virtqueue size exceeded" error.  QEMU refuses to
process further requests after the error, causing tasks inside the guest
to get stuck on I/O.

I don't have a good theory regarding the root cause.  Two ideas:
1. The guest is corrupting the vring or submitting more requests than
   will fit into the ring.  Somewhat unlikely because it happens with
   both Windows and Linux guests.
2. QEMU's virtqueue code is buggy, maybe the memory region cache which
   is used for fast guest RAM accesses.

Here is an expanded version of the debug patch which might help identify
which of these scenarios is likely.  Sorry, it requires running the
guest again!

This time let's make QEMU dump core so both QEMU state and guest RAM are
captured for further debugging.  That way it will be possible to extract
more information using gdb without rerunning.

Stefan
---
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index a1ff647a66..28d89fcbcb 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
         return NULL;
     }
     rcu_read_lock();
+    uint16_t old_shadow_avail_idx = vq->shadow_avail_idx;
     if (virtio_queue_empty_rcu(vq)) {
         goto done;
     }
@@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     max = vq->vring.num;

     if (vq->inuse >= vq->vring.num) {
+        fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
+        fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq));
+        fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
+        fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx 
%u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx);
+        fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) 
%u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + 
offsetof(VRingAvail, idx)));
+        fprintf(stderr, "used_idx %u\n", vq->used_idx);
+        abort(); /* <--- core dump! */
         virtio_error(vdev, "Virtqueue size exceeded");
         goto done;
     }

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]