qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support


From: Eugenio Perez Martin
Subject: Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
Date: Mon, 25 Mar 2024 21:33:26 +0100

On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> 
> > wrote:
> >>
> >> The goal of these patches is to add support to a variety of virtio and
> >> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> >> indicates that all buffers are used by the device in the same order in
> >> which they were made available by the driver.
> >>
> >> These patches attempt to implement a generalized, non-device-specific
> >> solution to support this feature.
> >>
> >> The core feature behind this solution is a buffer mechanism in the form
> >> of GLib's GHashTable. The decision behind using a hash table was to
> >> leverage their ability for quick lookup, insertion, and removal
> >> operations. Given that our keys are simply numbers of an ordered
> >> sequence, a hash table seemed like the best choice for a buffer
> >> mechanism.
> >>
> >> ---------------------
> >>
> >> The strategy behind this implementation is as follows:
> >>
> >> We know that buffers that are popped from the available ring and enqueued
> >> for further processing will always done in the same order in which they
> >> were made available by the driver. Given this, we can note their order
> >> by assigning the resulting VirtQueueElement a key. This key is a number
> >> in a sequence that represents the order in which they were popped from
> >> the available ring, relative to the other VirtQueueElements.
> >>
> >> For example, given 3 "elements" that were popped from the available
> >> ring, we assign a key value to them which represents their order (elem0
> >> is popped first, then elem1, then lastly elem2):
> >>
> >>       elem2   --  elem1   --  elem0   ---> Enqueue for processing
> >>      (key: 2)    (key: 1)    (key: 0)
> >>
> >> Then these elements are enqueued for further processing by the host.
> >>
> >> While most devices will return these completed elements in the same
> >> order in which they were enqueued, some devices may not (e.g.
> >> virtio-blk). To guarantee that these elements are put on the used ring
> >> in the same order in which they were enqueued, we can use a buffering
> >> mechanism that keeps track of the next expected sequence number of an
> >> element.
> >>
> >> In other words, if the completed element does not have a key value that
> >> matches the next expected sequence number, then we know this element is
> >> not in-order and we must stash it away in a hash table until an order
> >> can be made. The element's key value is used as the key for placing it
> >> in the hash table.
> >>
> >> If the completed element has a key value that matches the next expected
> >> sequence number, then we know this element is in-order and we can push
> >> it on the used ring. Then we increment the next expected sequence number
> >> and check if the hash table contains an element at this key location.
> >>
> >> If so, we retrieve this element, push it to the used ring, delete the
> >> key-value pair from the hash table, increment the next expected sequence
> >> number, and check the hash table again for an element at this new key
> >> location. This process is repeated until we're unable to find an element
> >> in the hash table to continue the order.
> >>
> >> So, for example, say the 3 elements we enqueued were completed in the
> >> following order: elem1, elem2, elem0. The next expected sequence number
> >> is 0:
> >>
> >>      exp-seq-num = 0:
> >>
> >>       elem1   --> elem1.key == exp-seq-num ? --> No, stash it
> >>      (key: 1)                                         |
> >>                                                       |
> >>                                                       v
> >>                                                 ================
> >>                                                 |key: 1 - elem1|
> >>                                                 ================
> >>      ---------------------
> >>      exp-seq-num = 0:
> >>
> >>       elem2   --> elem2.key == exp-seq-num ? --> No, stash it
> >>      (key: 2)                                         |
> >>                                                       |
> >>                                                       v
> >>                                                 ================
> >>                                                 |key: 1 - elem1|
> >>                                                 |--------------|
> >>                                                 |key: 2 - elem2|
> >>                                                 ================
> >>      ---------------------
> >>      exp-seq-num = 0:
> >>
> >>       elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
> >>      (key: 0)
> >>
> >>      exp-seq-num = 1:
> >>
> >>      lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
> >>                                               remove elem from table
> >>                                                       |
> >>                                                       v
> >>                                                 ================
> >>                                                 |key: 2 - elem2|
> >>                                                 ================
> >>
> >>      exp-seq-num = 2:
> >>
> >>      lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
> >>                                               remove elem from table
> >>                                                       |
> >>                                                       v
> >>                                                 ================
> >>                                                 |   *empty*    |
> >>                                                 ================
> >>
> >>      exp-seq-num = 3:
> >>
> >>      lookup(table, exp-seq-num) != NULL ? --> No, done
> >>      ---------------------
> >>
> >
> > I think to use a hashtable to handle this has an important drawback:
> > it hurts performance on the devices that are using right in-order
> > because of hash calculus, to benefit devices that are using it badly
> > by using descriptors out of order. We should use data structs that are
> > as free as possible for the first, and we don't care to worse the
> > experience of the devices that enable in_order and they shouldn't.
> >
>
> Right, because if descriptors are coming in in-order, we still search
> the (empty) hash table.
>
> Hmm... what if we introduced a flag to see if we actually should bother
> searching the hash table? That way we avoid the cost of searching when
> we really don't need to.
>
> > So I suggest reusing vq->used_elems array vq. At each used descriptor
> > written in the used ring, you know the next head is elem->index +
> > elem->ndescs, so you can check if that element has been filled or not.
> > If used, it needs to be flushed too. If not used, just return.
> >
> > Of course virtqueue_flush also needs to take this into account.
> >
> > What do you think, does it make sense to you?
> >
>
> I'm having a bit of trouble understanding the suggestion here. Would you
> mind elaborating a bit more for me on this?
>
> For example, say elem0, elem1, and elem2 were enqueued in-order (elem0
> being first, elem2 last) and then elem2 finishes first, elem1 second,
> and elem0 third. Given that these elements finish out-of-order, how
> would you handle these out-of-order elements using your suggestion?
>

virtqueue_fill is called first with elem2. So vq->used_elems[2 %
vq->num] is filled with the needed information of the descriptor:
index, len and ndescs. idx function parameter is ignored.

Optionally, virtqueue_push is called. It checks if
vq->used_elems[vq->used_idx] is valid. valid can be elem->in_num +
elem->out_num > 0, and reset them on every used ring write. If it is
not valid, this is a no-op. Currently, it is not valid.

Same process for elem1.

virtqueue_fill is the same for elem0. But now virtqueue_flush gets
interesting, as it detects vq->used_elems[0] is used. It scans for the
first not-used element, and it finds it is vq->used_elems[3]. So it
needs to write an used elem with id = 2 and the corresponding length.

Maybe it is interesting to implement ways to improve the look for the
last used descriptor, but if any I'd go for a bitmap and always on top
of the basis series.

The algorithm has not been tested, so maybe I've missed something.

Thanks!

> Thanks :)
>
> > Thanks!
> >
> >
> >> Jonah Palmer (8):
> >>    virtio: Define InOrderVQElement
> >>    virtio: Create/destroy/reset VirtQueue In-Order hash table
> >>    virtio: Define order variables
> >>    virtio: Implement in-order handling for virtio devices
> >>    virtio-net: in-order handling
> >>    vhost-svq: in-order handling
> >>    vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
> >>    virtio: Add VIRTIO_F_IN_ORDER property definition
> >>
> >>   hw/block/vhost-user-blk.c          |   1 +
> >>   hw/net/vhost_net.c                 |   2 +
> >>   hw/net/virtio-net.c                |   6 +-
> >>   hw/scsi/vhost-scsi.c               |   1 +
> >>   hw/scsi/vhost-user-scsi.c          |   1 +
> >>   hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
> >>   hw/virtio/vhost-user-fs.c          |   1 +
> >>   hw/virtio/vhost-user-vsock.c       |   1 +
> >>   hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
> >>   include/hw/virtio/virtio.h         |  20 +++++-
> >>   net/vhost-vdpa.c                   |   1 +
> >>   11 files changed, 145 insertions(+), 7 deletions(-)
> >>
> >> --
> >> 2.39.3
> >>
> >
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]