[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event |
Date: |
Tue, 20 Sep 2022 18:10:08 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 |
On 9/20/22 17:47, Markus Armbruster wrote:
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
For now we only log the vhost device error, when virtqueue is actually
stopped. Let's add a QAPI event, which makes possible:
- collect statistics of such errors
- make immediate actions: take coredums or do some other debugging
Core dumps, I presume.
Is QMP the right tool for the job? Or could a trace point do?
Management tool already can collect QMP events. So, if we want to forward some
QMP events to other subsystems (to immediately inform support team, or to
update some statistics) it's simple to realize for QMP events. But I'm not sure
how to do it for trace-events.. Scanning trace logs is not convenient.
Another benefit of QMP events is that they are objects, with fields, which is
better for machine processing than textual trace-events.
The event could be reused for some other virtqueue problems (not only
for vhost devices) in future. For this it gets a generic name and
structure.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
hw/virtio/vhost.c | 12 +++++++++---
qapi/qdev.json | 25 +++++++++++++++++++++++++
2 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f758f177bb..caa81f2ace 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -15,6 +15,7 @@
#include "qemu/osdep.h"
#include "qapi/error.h"
+#include "qapi/qapi-events-qdev.h"
#include "hw/virtio/vhost.h"
#include "qemu/atomic.h"
#include "qemu/range.h"
Only tangentially related to this patch, but here goes anyway:
/* enabled until disconnected backend stabilizes */
#define _VHOST_DEBUG 1
This is from 2016. Has it stabilized?
Hmm, I don't know:) It works for us)
But anyway, I agree that error/debug reporting here needs an update. I don't
think that dropping the messages is good. Some should be converted to
errp-reporting, some to warnings or assertions..
#ifdef _VHOST_DEBUG
#define VHOST_OPS_DEBUG(retval, fmt, ...) \
do { \
error_report(fmt ": %s (%d)", ## __VA_ARGS__, \
strerror(-retval), -retval); \
error_report() is for errors the user can do something about, not for
debug messages.
} while (0)
#else
#define VHOST_OPS_DEBUG(retval, fmt, ...) \
do { } while (0)
#endif
@@ -1287,11 +1288,16 @@ static void
vhost_virtqueue_error_notifier(EventNotifier *n)
struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
error_notifier);
struct vhost_dev *dev = vq->dev;
- int index = vq - dev->vqs;
if (event_notifier_test_and_clear(n) && dev->vdev) {
- VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d",
- dev->vq_index + index);
+ int ind = vq - dev->vqs + dev->vq_index;
+ DeviceState *ds = &dev->vdev->parent_obj;
+
+ VHOST_OPS_DEBUG(-EINVAL, "vhost vring error in virtqueue %d", ind);
+ qapi_event_send_virtqueue_error(!!ds->id, ds->id, ds->canonical_path,
+ ind, VIRTQUEUE_ERROR_VHOST_VRING_ERR,
+ "vhost reported failure through vring "
+ "error fd");
Do we still need VHOST_OPS_DEBUG() here?
I think, this should be decided separately from this patch. Here I keep current
behavior and add new event.
}
}
diff --git a/qapi/qdev.json b/qapi/qdev.json
index 2708fb4e99..b7c2669c2c 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -158,3 +158,28 @@
##
{ 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @VirtqueueError:
+#
+# Since: 7.2
+##
+{ 'enum': 'VirtqueueError',
+ 'data': [ 'vhost-vring-err' ] }
+
+##
+# @VIRTQUEUE_ERROR:
+#
+# Emitted when a device virtqueue fails in runtime.
+#
+# @device: the device's ID if it has one
+# @path: the device's QOM path
+# @virtqueue: virtqueue index
+# @error: error identifier
+# @description: human readable description
+#
+# Since: 7.2
+##
+{ 'event': 'VIRTQUEUE_ERROR',
+ 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
+ 'error': 'VirtqueueError', 'description': 'str'} }
Can the guest trigger the event?
Yes, but as I understand, only once per virtqueue.
If yes, it needs to be rate-limited.
This may be needed if VIRTQUEUE_ERROR will be shared with other errors. Still
adding it now will not hurt I think.
--
Best regards,
Vladimir