[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size ex
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error |
Date: |
Wed, 20 Feb 2019 18:53:54 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 |
On 20/02/19 17:58, Stefan Hajnoczi wrote:
> On Mon, Feb 18, 2019 at 07:21:25AM +0000, Fernando Casas Schössow wrote:
>> It took a few days but last night the problem was reproduced.
>> This is the information from the log:
>>
>> vdev 0x55f261d940f0 ("virtio-blk")
>> vq 0x55f261d9ee40 (idx 0)
>> inuse 128 vring.num 128
>> old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
>> avail 0x3d87a800 avail_idx (cache bypassed) 58625
>
> Hi Paolo,
> Are you aware of any recent MemoryRegionCache issues? The avail_idx
> value 58874 was read via the cache while a non-cached read produces
> 58625!
>
> I suspect that 58625 is correct since the vring is already full and the
> driver wouldn't bump avail_idx any further until requests complete.
>
> Fernando also hits this issue with virtio-scsi so it's not a
> virtio_blk.ko driver bug or a virtio-blk device emulation issue.
No, I am not aware of any issues.
How can I get the core dump (and the corresponding executable to get the
symbols)? Alternatively, it should be enough to print the
vq->vring.caches->avail.mrs from the debugger.
Also, one possibility is to add in vring_avail_idx an assertion like
assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev,
vq->vring.avail + offsetof(VRingAvail, idx)));
and try to catch the error earlier.
Paolo
> A QEMU core dump is available for debugging.
>
> Here is the patch that produced this debug output:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index a1ff647a66..28d89fcbcb 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
> return NULL;
> }
> rcu_read_lock();
> + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx;
> if (virtio_queue_empty_rcu(vq)) {
> goto done;
> }
> @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
> max = vq->vring.num;
>
> if (vq->inuse >= vq->vring.num) {
> + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
> + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq -
> vdev->vq));
> + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
> + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx
> %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx);
> + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed)
> %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail +
> offsetof(VRingAvail, idx)));
> + fprintf(stderr, "used_idx %u\n", vq->used_idx);
> + abort(); /* <--- core dump! */
> virtio_error(vdev, "Virtqueue size exceeded");
> goto done;
> }
>
> Stefan
>
>> used_idx 58497
>> 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed
>>
>> The dump file, including guest memory, was generated successfully (after
>> gzip the file is around 492MB).
>> I switched the guest now to virtio-scsi to get the information and dump with
>> this setup as well.
>>
>> How should we proceed?
>>
>> Thanks.
>>
>> On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <address@hidden> wrote:
>> Thanks for collecting the data! The fact that both virtio-blk and
>> virtio-scsi failed suggests it's not a virtqueue element leak in the
>> virtio-blk or virtio-scsi device emulation code. The hung task error
>> messages from inside the guest are a consequence of QEMU hitting the
>> "Virtqueue size exceeded" error. QEMU refuses to process further requests
>> after the error, causing tasks inside the guest to get stuck on I/O. I don't
>> have a good theory regarding the root cause. Two ideas: 1. The guest is
>> corrupting the vring or submitting more requests than will fit into the
>> ring. Somewhat unlikely because it happens with both Windows and Linux
>> guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache
>> which is used for fast guest RAM accesses. Here is an expanded version of
>> the debug patch which might help identify which of these scenarios is
>> likely. Sorry, it requires running the guest again! This time let's make
>> QEMU dump core so both QEMU state and guest RAM are captured for further
>> debugging. That way it will be possible to extract more information using
>> gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c
>> b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 ---
>> a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void
>> *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); +
>> uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if
>> (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void
>> *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse
>> >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev,
>> vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq -
>> vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse,
>> vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u
>> avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx,
>> vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx
>> (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev,
>> vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx
>> %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev,
>> "Virtqueue size exceeded"); goto done; }
>>
>>
signature.asc
Description: OpenPGP digital signature
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, (continued)
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/01
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/04
- Message not available
- Message not available
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/06
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/10
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/11
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/18
- Message not available
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/19
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/20
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error,
Paolo Bonzini <=
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/20
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/21
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/21
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Stefan Hajnoczi, 2019/02/22
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Paolo Bonzini, 2019/02/22
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/22
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Paolo Bonzini, 2019/02/22
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2019/02/22
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Dr. David Alan Gilbert, 2019/02/22
- Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error, Paolo Bonzini, 2019/02/22