[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] virtio-pci: disable vring processing when bus-mastering i
From: |
Michael S. Tsirkin |
Subject: |
Re: [PATCH v2] virtio-pci: disable vring processing when bus-mastering is disabled |
Date: |
Thu, 28 Nov 2019 12:03:01 -0500 |
On Thu, Nov 28, 2019 at 05:48:00PM +0100, Halil Pasic wrote:
> On Tue, 19 Nov 2019 18:50:03 -0600
> Michael Roth <address@hidden> wrote:
>
> [..]
> > I.e. the calling code is only scheduling a one-shot BH for
> > virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
> > an additional virtqueue entry before we get there. This is likely due
> > to the following check in virtio_queue_host_notifier_aio_poll:
> >
> > static bool virtio_queue_host_notifier_aio_poll(void *opaque)
> > {
> > EventNotifier *n = opaque;
> > VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
> > bool progress;
> >
> > if (!vq->vring.desc || virtio_queue_empty(vq)) {
> > return false;
> > }
> >
> > progress = virtio_queue_notify_aio_vq(vq);
> >
> > namely the call to virtio_queue_empty(). In this case, since no new
> > requests have actually been issued, shadow_avail_idx == last_avail_idx,
> > so we actually try to access the vring via vring_avail_idx() to get
> > the latest non-shadowed idx:
> >
> > int virtio_queue_empty(VirtQueue *vq)
> > {
> > bool empty;
> > ...
> >
> > if (vq->shadow_avail_idx != vq->last_avail_idx) {
> > return 0;
> > }
> >
> > rcu_read_lock();
> > empty = vring_avail_idx(vq) == vq->last_avail_idx;
> > rcu_read_unlock();
> > return empty;
> >
> > but since the IOMMU region has been disabled we get a bogus value (0
> > usually), which causes virtio_queue_empty() to falsely report that
> > there are entries to be processed, which causes errors such as:
> >
> > "virtio: zero sized buffers are not allowed"
> >
> > or
> >
> > "virtio-blk missing headers"
> >
> > and puts the device in an error state.
> >
>
> I've seen something similar on s390x with virtio-ccw-blk under
> protected virtualization, that made me wonder about how virtio-blk in
> particular but also virtio in general handles shutdown and reset.
>
> This makes me wonder if bus-mastering disabled is the only scenario when
> a something like vdev->disabled should be used.
>
> In particular I have the following mechanism in mind
>
> qemu_system_reset() --> ... --> qemu_devices_reset() --> ... -->
> --> virtio_[transport]_reset() --> ... --> virtio_bus_stop_ioeventfd()
> --> virtio_blk_data_plane_stop()
>
> which in turn triggesrs the following cascade:
> virtio_blk_data_plane_stop_bh -->
> virtio_queue_aio_set_host_notifier_handler() -->
> --> virtio_queue_host_notifier_aio_read()
> which however calls
> virtio_queue_notify_aio_vq() if the notifier tests as
> positive.
>
> Since we still have vq->handle_aio_output that means we may
> call virtqueue_pop() during the reset procedure.
>
> This was a problem for us, because (due to a bug) the shared pages that
> constitute the virtio ring weren't shared any more. And thus we got
> the infamous
> virtio_error(vdev, "virtio: zero sized buffers are not allowed").
>
> Now the bug is no more, and we can tolerate that somewhat late access
> to the virtio ring.
>
> But it keeps nagging me, is it really OK for the device to access the
> virtio ring during reset? My intuition tells me that the device should
> not look for new requests after it has been told to reset.
Well it's after it was told to reset but it's not after
it completed reset. So I think it's fine ...
> Opinions? (Michael, Connie)
>
> Regards,
> Halil
>
> > This patch works around the issue by introducing virtio_set_disabled(),
> > which sets a 'disabled' flag to bypass checks like virtio_queue_empty()
> > when bus-mastering is disabled. Since we'd check this flag at all the
> > same sites as vdev->broken, we replace those checks with an inline
> > function which checks for either vdev->broken or vdev->disabled.
> >
> > The 'disabled' flag is only migrated when set, which should be fairly
> > rare, but to maintain migration compatibility we disable it's use for
> > older machine types. Users requiring the use of the flag in conjunction
> > with older machine types can set it explicitly as a virtio-device
> > option.
> >