[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEED
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET" |
Date: |
Tue, 21 Apr 2015 11:59:30 +0200 |
On Tue, Apr 21, 2015 at 05:16:53PM +0800, Fam Zheng wrote:
> On Tue, 04/21 11:08, Cornelia Huck wrote:
> > On Tue, 21 Apr 2015 16:38:31 +0800
> > Fam Zheng <address@hidden> wrote:
> >
> > > On Tue, 04/21 10:04, Cornelia Huck wrote:
> > > > On Tue, 21 Apr 2015 15:44:02 +0800
> > > > Fam Zheng <address@hidden> wrote:
> > > >
> > > > > On Mon, 04/20 17:13, Cornelia Huck wrote:
> > > > > > On Fri, 17 Apr 2015 15:59:15 +0800
> > > > > > Fam Zheng <address@hidden> wrote:
> > > > > >
> > > > > > > Currently, virtio code chooses to kill QEMU if the guest passes
> > > > > > > any invalid
> > > > > > > data with vring. That has drawbacks such as losing unsaved data
> > > > > > > (e.g. when
> > > > > > > guest user is writing a very long email), or possible denial of
> > > > > > > service in
> > > > > > > a nested vm use case where virtio device is passed through.
> > > > > > >
> > > > > > > virtio-1 has introduced a new status bit "NEEDS RESET" which
> > > > > > > could be used to
> > > > > > > improve this by communicating the error state between virtio
> > > > > > > devices and
> > > > > > > drivers. The device notifies guest upon setting the bit, then the
> > > > > > > guest driver
> > > > > > > should detect this bit and report to userspace, or recover the
> > > > > > > device by
> > > > > > > resetting it.
> > > > > > >
> > > > > > > This series makes necessary changes in virtio core code, based on
> > > > > > > which
> > > > > > > virtio-blk is converted. Other devices now keep the existing
> > > > > > > behavior by
> > > > > > > passing in "error_abort". They will be converted in following
> > > > > > > series. The Linux
> > > > > > > driver part will also be worked on.
> > > > > > >
> > > > > > > One concern with this behavior change is that it's now harder to
> > > > > > > notice the
> > > > > > > actual driver bug that caused the error, as the guest continues
> > > > > > > to run. To
> > > > > > > address that, we could probably add a new error action option to
> > > > > > > virtio
> > > > > > > devices, similar to the "read/write werror" in block layer, so
> > > > > > > the vm could be
> > > > > > > paused and the management will get an event in QMP like pvpanic.
> > > > > > > This work can
> > > > > > > be done on top.
> > > > > >
> > > > > > In principle, this looks nice; I'm not sure however how this affects
> > > > > > non-virtio-1 devices.
> > > > > >
> > > > > > If a device is operating in virtio-1 mode, everything is clearly
> > > > > > specified: The guest is notified and if it is aware of the
> > > > > > NEEDS_RESET
> > > > > > bit, it can react accordingly.
> > > > > >
> > > > > > But what about legacy devices? Even if they are notified, they don't
> > > > > > know to check for NEEDS_RESET - and I'm not sure if the undefined
> > > > > > behaviour after NEEDS_RESET might lead to bigger trouble than
> > > > > > killing
> > > > > > off the guest.
> > > > > >
> > > > >
> > > > > The device should become unresponsive to VQ output until guest issues
> > > > > a reset
> > > > > through bus commands. Do you have an example of "big trouble" in
> > > > > mind?
> > > >
> > > > I'm not sure what's supposed to happen if NEEDS_RESET is set but not
> > > > everything is fenced off. The guest may see that queues have become
> > > > unresponsive, but if we don't stop ioeventfds and fence off
> > > > notifications, it may easily get into an undefined state internally.
> > >
> > > Yeah, disabling ioeventfds and notifications is a good idea.
> > >
> > > > And if it is connected to other guests via networking, having it limp
> > > > on may be worse than just killing it off. (Which parts of the data have
> > > > been cleanly written to disk and which haven't?
> > >
> > > Well, we don't know that even without this series, do we?
> >
> > We know it hasn't, as the guest is dead :)
> >
> > >
> > > > How is it going to get
> > > > out of that pickle if it has no good idea of what is wrong?
> > >
> > > If it's virtio-1 compatible, it can reset the device or mark the device
> > > ususable, either way guest gets a chance to save the work.
> >
> > My problem is not with virtio-1 devices; although data certainly can't
> > be written if the device has become unusable.
> >
> > >
> > > If it's not, it's merely an unresponsive device, and guest user can
> > > reboot/shutdown.
> >
> > But how does any management software know? If I'm logged into a system
> > and I notice that saving my data doesn't complete, I can trigger an
> > action (although reboot/shutdown may not work anymore if too many
> > threads are waiting on writeback), but how can an automation system
> > know? It is probably more useful for those setups to have a hard stop
> > if recovery is not possible - and for legacy systems, that means
> > killing the guest afaics.
> >
> > >
> > > >
> > > > If I have to debug a non-working guest, I prefer a crashed one with a
> > > > clean state over one that has continued running after the error
> > > > occurred.
> > >
> > > For debugging purpose, crashing is definitely fine (even better :), but
> > > only
> > > because we won't have critical applications in guest.
> >
> > I would argue even for critical applications. They should have a second
> > guest as backup :)
> >
> > > It makes sense to user to
> > > avoid the overkiller "exit(1)"'s in QEMU. They don't even generate a core
> > > file.
> >
> > Let's keep dying, but use abort? Would that help?
> >
> > > And even if they do, it would be much more painful to recover an unsaved
> > > libreoffice document from a memory core.
> >
> > See my reply above.
> >
> > My concern is mainly about legacy setups that aren't used interactively.
> >
>
> How about pausing guest and generating an QMP event?
>
> Fam
This can be an option, default should be backeards-compatible though.
--
MST
- Re: [Qemu-devel] [PATCH 04/18] virtio: Return error from virtqueue_next_desc, (continued)
[Qemu-devel] [PATCH 08/18] virtio: Return error from virtio_add_queue, Fam Zheng, 2015/04/17
Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Cornelia Huck, 2015/04/20
- Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Fam Zheng, 2015/04/21
- Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Cornelia Huck, 2015/04/21
- Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Fam Zheng, 2015/04/21
- Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Cornelia Huck, 2015/04/21
- Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Fam Zheng, 2015/04/21
- Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Cornelia Huck, 2015/04/21
- Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET",
Michael S. Tsirkin <=
Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Michael S. Tsirkin, 2015/04/20
Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Fam Zheng, 2015/04/20