[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Bug 1558175] Re: virtio: vm killed (Guest moved used i
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [Bug 1558175] Re: virtio: vm killed (Guest moved used index) |
Date: |
Fri, 18 Mar 2016 09:45:26 +0000 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Thu, Mar 17, 2016 at 03:56:42PM -0000, Laszlo Ersek (Red Hat) wrote:
> Stefan, I too had the same immediate idea upon seeing this bug report.
> But, after I skimmed the DPDK code briefly, I think it does reset the
> virtio-net device correctly, before it tries to use it.
>
> Instead, at least based on the extensive log that Julien pasted, I
> believe the following happens: when the first instance of testpmd is
> killed ungracefully, it gets no chance at resetting the virtio-net
> device at shutdown. The vtpci_reset() call in virtio_dev_close() is
> likely never reached. This leaves the virtio queues alive, as far as
> QEMU is concerned, but in the guest, the memory that used to cover them
> goes away.
>
> So when the second instance of testpmd is started, and a bunch of memory
> is allocated and written to, I think testpmd scribbles over the
> "leftover" live virtio queues that QEMU / KVM are still watching. The
> hypervisor is allowed to notice changes to the virtqueues without
> explicit guest notifications (hence the elaborate barrier stuff in the
> Linux kernel drivers, for example). I suspect things blow up before the
> second testpmd process even thinks about using virtio-net. (It is hard
> to confirm from the log that Julien pasted, because he snipped exactly
> the part that leads up to the failure.)
>
> This failure mode (if my hunch is correct) is special to DPDK, I think.
> In a normal guest kernel scenario, the memory that covers the virtqueues
> is managed by the kernel, and you can't just kill the kernel. You might
> be able to unload the virtio-net driver module, but for that one has to
> tear down the corresponding ethX interfaces first, and I'm quite sure
> the virtio-net devices will be re-set then.
>
> We've seen the exact same problem with iPXE (in UEFI guests) as well,
> when iPXE would transfer control to the kernel or another payload; but
> iPXE got fixed: it now disconnects the virtio-net NIC (and other NICs
> too) in the ExitBootServices() callback. (I'm not perfectly happy with
> that fix for unrelated reasons, but it definitely covers this issue.)
>
> OVMF too resets virtio devices in the ExitBootServices() callbacks of
> its virtio drivers. So this failure mode seems to be special to DPDK,
> where you can kill the testpmd process and deprive it from the chance to
> clean up the virtqueues (by resetting the device).
QEMU can and should help by making this a non-fatal error: treat the
device as broken when an invalid state is reached and stop processing
virtqueues until it is reset. Fatal errors in QEMU device emulation are
a bad thing.
However, it's still a guest code bug because a driver must not abandon
an active device. Depending on the contents of the rings it could cause
spurious I/O leading to data corruption.
So this needs to be fixed in DPDK or the application.
Stefan
signature.asc
Description: PGP signature
- [Qemu-devel] [Bug 1558175] [NEW] virtio: vm killed (Guest moved used index), Julien Meunier, 2016/03/16
- [Qemu-devel] [Bug 1558175] Re: virtio: vm killed (Guest moved used index), Julien Meunier, 2016/03/16
- Re: [Qemu-devel] [Bug 1558175] [NEW] virtio: vm killed (Guest moved used index), Stefan Hajnoczi, 2016/03/17
- [Qemu-devel] [Bug 1558175] Re: virtio: vm killed (Guest moved used index), Laszlo Ersek \(Red Hat\), 2016/03/17
- [Qemu-devel] [Bug 1558175] Re: virtio: vm killed (Guest moved used index), Laszlo Ersek \(Red Hat\), 2016/03/17
- [Qemu-devel] [Bug 1558175] Re: virtio: vm killed (Guest moved used index), Laszlo Ersek \(Red Hat\), 2016/03/17
- Re: [Qemu-devel] [Bug 1558175] Re: virtio: vm killed (Guest moved used index),
Stefan Hajnoczi <=
- [Qemu-devel] [Bug 1558175] Re: virtio: vm killed (Guest moved used index), Julien Meunier, 2016/03/18