qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH] virtio: add VIRTQUEUE_ERROR QAPI event
Date: Tue, 20 Sep 2022 18:10:08 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1

On 9/20/22 17:47, Markus Armbruster wrote:
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:

For now we only log the vhost device error, when virtqueue is actually
stopped. Let's add a QAPI event, which makes possible:

  - collect statistics of such errors
  - make immediate actions: take coredums or do some other debugging

Core dumps, I presume.

Is QMP the right tool for the job?  Or could a trace point do?

Management tool already can collect QMP events. So, if we want to forward some 
QMP events to other subsystems (to immediately inform support team, or to 
update some statistics) it's simple to realize for QMP events. But I'm not sure 
how to do it for trace-events.. Scanning trace logs is not convenient.

Another benefit of QMP events is that they are objects, with fields, which is 
better for machine processing than textual trace-events.


The event could be reused for some other virtqueue problems (not only
for vhost devices) in future. For this it gets a generic name and
structure.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
  hw/virtio/vhost.c | 12 +++++++++---
  qapi/qdev.json    | 25 +++++++++++++++++++++++++
  2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f758f177bb..caa81f2ace 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -15,6 +15,7 @@
#include "qemu/osdep.h"
  #include "qapi/error.h"
+#include "qapi/qapi-events-qdev.h"
  #include "hw/virtio/vhost.h"
  #include "qemu/atomic.h"
  #include "qemu/range.h"

Only tangentially related to this patch, but here goes anyway:

    /* enabled until disconnected backend stabilizes */
    #define _VHOST_DEBUG 1

This is from 2016.  Has it stabilized?

Hmm, I don't know:) It works for us)

But anyway, I agree that error/debug reporting here needs an update. I don't 
think that dropping the messages is good. Some should be converted to 
errp-reporting, some to warnings or assertions..


    #ifdef _VHOST_DEBUG
    #define VHOST_OPS_DEBUG(retval, fmt, ...) \
        do { \
            error_report(fmt ": %s (%d)", ## __VA_ARGS__, \
                         strerror(-retval), -retval); \

error_report() is for errors the user can do something about, not for
debug messages.

        } while (0)
    #else
    #define VHOST_OPS_DEBUG(retval, fmt, ...) \
        do { } while (0)
    #endif

@@ -1287,11 +1288,16 @@ static void 
vhost_virtqueue_error_notifier(EventNotifier *n)
      struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
                                                error_notifier);
      struct vhost_dev *dev = vq->dev;
-    int index = vq - dev->vqs;
if (event_notifier_test_and_clear(n) && dev->vdev) {
-        VHOST_OPS_DEBUG(-EINVAL,  "vhost vring error in virtqueue %d",
-                        dev->vq_index + index);
+        int ind = vq - dev->vqs + dev->vq_index;
+        DeviceState *ds = &dev->vdev->parent_obj;
+
+        VHOST_OPS_DEBUG(-EINVAL,  "vhost vring error in virtqueue %d", ind);
+        qapi_event_send_virtqueue_error(!!ds->id, ds->id, ds->canonical_path,
+                                        ind, VIRTQUEUE_ERROR_VHOST_VRING_ERR,
+                                        "vhost reported failure through vring "
+                                        "error fd");

Do we still need VHOST_OPS_DEBUG() here?

I think, this should be decided separately from this patch. Here I keep current 
behavior and add new event.


      }
  }
diff --git a/qapi/qdev.json b/qapi/qdev.json
index 2708fb4e99..b7c2669c2c 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -158,3 +158,28 @@
  ##
  { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
    'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @VirtqueueError:
+#
+# Since: 7.2
+##
+{ 'enum': 'VirtqueueError',
+  'data': [ 'vhost-vring-err' ] }
+
+##
+# @VIRTQUEUE_ERROR:
+#
+# Emitted when a device virtqueue fails in runtime.
+#
+# @device: the device's ID if it has one
+# @path: the device's QOM path
+# @virtqueue: virtqueue index
+# @error: error identifier
+# @description: human readable description
+#
+# Since: 7.2
+##
+{ 'event': 'VIRTQUEUE_ERROR',
+ 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
+            'error': 'VirtqueueError', 'description': 'str'} }

Can the guest trigger the event?

Yes, but as I understand, only once per virtqueue.


If yes, it needs to be rate-limited.


This may be needed if VIRTQUEUE_ERROR will be shared with other errors. Still 
adding it now will not hurt I think.

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]