qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v4 18/20] vhost: Add VhostIOVATree


From: Eugenio Perez Martin
Subject: Re: [RFC PATCH v4 18/20] vhost: Add VhostIOVATree
Date: Thu, 21 Oct 2021 09:03:15 +0200

On Thu, Oct 21, 2021 at 4:34 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Oct 20, 2021 at 8:07 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Wed, Oct 20, 2021 at 11:01 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Oct 20, 2021 at 3:54 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Tue, Oct 19, 2021 at 11:23 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Tue, Oct 19, 2021 at 4:32 PM Jason Wang <jasowang@redhat.com> 
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > > > > > This tree is able to look for a translated address from an IOVA 
> > > > > > > address.
> > > > > > >
> > > > > > > At first glance is similar to util/iova-tree. However, SVQ 
> > > > > > > working on
> > > > > > > devices with limited IOVA space need more capabilities, like 
> > > > > > > allocating
> > > > > > > IOVA chunks or perform reverse translations (qemu addresses to 
> > > > > > > iova).
> > > > > >
> > > > > >
> > > > > > I don't see any reverse translation is used in the shadow code. Or
> > > > > > anything I missed?
> > > > >
> > > > > Ok, it looks to me that it is used in the iova allocator. But I think
> > > > > it's better to decouple it to an independent allocator instead of
> > > > > vhost iova tree.
> > > > >
> > > >
> > > > Reverse translation is used every time a buffer is made available,
> > > > since buffers content are not copied, only the descriptors to SVQ
> > > > vring.
> > >
> > > I may miss something but I didn't see the code? Qemu knows the VA of
> > > virtqueue, and the VA of the VQ is stored in the VirtQueueElem?
> > >
> >
> > It's used in the patch 20/20, could that be the misunderstanding? The
> > function calling it is vhost_svq_translate_addr.
> >
> > Qemu knows the VA address of the buffer, but it must offer a valid SVQ
> > iova to the device. That is the translation I mean.
>
> Ok, I get you. So if I understand correctly, what you did is:
>
> 1) allocate IOVA during region_add
> 2) preform VA->IOVA reverse lookup in handle_kick
>
> This should be fine, but here're some suggestions:
>
> 1) remove the assert(map) in vhost_svq_translate_addr() since guest
> can add e.g BAR address

Wouldn't VirtQueue block them in virtqueue_pop / address_space_read_*
functions? I'm fine to remove it but I would say it adds value against
coding error.

> 2) we probably need a better name vhost_iova_tree_alloc(), maybe
> "vhost_iova_tree_map_alloc()"
>

Ok I will change for the next version.

> There's actually another method.
>
> 1) don't do IOVA/map allocation in region_add()
> 2) do the allocation in handle_kick(), then we know the IOVA so no
> reverse lookup
>
> The advantage is that this can work for the case of vIOMMU. And they
> should perform the same:
>
> 1) you method avoid the iova allocation per sg
> 2) my method avoid the reverse lookup per sg
>

It's somehow doable, but we are replacing a tree search with a linear
insertion at this moment.

I would say that guest's IOVA -> qemu vaddr part works with no change
for vIOMMU, since VirtQueue's virtqueue_pop already gives us the vaddr
even in the case of vIOMMU. The only change I would add for that case
is the SVQ -> device map/unmapping part, so the device cannot access
random addresses but only the exposed ones. I'm assuming that part is
O(1).

This way, we already have a tree with all the possible guest's
addresses, and we only need to look for it's SVQ iova -> vaddr
translation. This is a O(log(N)) operation, and read only, so it's
easily parallelizable when we make each SVQ in it's own thread (if
needed). The only thing left is to expose that with an iommu miss
(O(1)) and unmap it on used buffers processing (also O(1)). The
domination operation keeps being VirtQueue's own code lookup for
guest's IOVA -> GPA, which I'm assuming is already well optimized and
will benefit from future optimizations since qemu's memory system is
frequently used.

To optimize your use case we would need to add a custom (and smarter
than the currently used) allocator to SVQ. I've been looking for ways
to reuse glibc or similar in our own arenas but with no luck. It will
be code that SVQ needs to maintain by and for itself anyway.

In either case it should not be hard to switch to your method, just a
few call changes in the future, if we achieve a faster allocator.

Would that make sense?

> >
> > > >
> > > > At this point all the limits are copied to vhost iova tree in the next
> > > > revision I will send, defined at its creation at
> > > > vhost_iova_tree_new(). They are outside of util/iova-tree, only sent
> > > > to the latter at allocation time.
> > > >
> > > > Since vhost_iova_tree has its own vhost_iova_tree_alloc(), that wraps
> > > > the iova_tree_alloc() [1], limits could be kept in vhost-vdpa and make
> > > > them an argument of vhost_iova_tree_alloc. But I'm not sure if it's
> > > > what you are proposing or I'm missing something.
> > >
> > > If the reverse translation is only used in iova allocation, I meant to
> > > split the logic of IOVA allocation itself.
> > >
> >
> > Still don't understand it, sorry :). In SVQ setup we allocate an iova
> > address for every guest's GPA address its driver can use. After that
> > there should be no allocation unless memory is hotplugged.
> >
> > So the limits are only needed precisely at allocation time. Not sure
> > if that is what you mean here, but to first allocate and then check if
> > it is within the range could lead to false negatives, since there
> > could be a valid range *in* the address but the iova allocator
> > returned us another range that fell outside the range. How could we
> > know the cause if it is not using the range itself?
>
> See my above reply. And we can teach the iova allocator to return the
> IOVA in the range that vhost-vDPA supports.
>

Ok,

For the next series it will be that way. I'm pretty sure we are
aligned in this part, but the lack of code in this series makes it
very hard to discuss it :).

Thanks!

> Thanks
>
> >
> > > >
> > > > Either way, I think it is harder to talk about this specific case
> > > > without code, since this one still does not address the limits. Would
> > > > you prefer me to send another RFC in WIP quality, with *not* all
> > > > comments addressed? I would say that there is not a lot of pending
> > > > work to send the next one, but it might be easier for all of us.
> > >
> > > I'd prefer to try to address them all, otherwise it's not easy to see
> > > what is missing.
> > >
> >
> > Got it, I will do it that way then!
> >
> > Thanks!
> >
> > > Thanks
> > >
> > > >
> > > > Thanks!
> > > >
> > > > [1] This util/iova-tree method will be proposed in the next series,
> > > > and vhost_iova_tree wraps it since it needs to keep in sync both
> > > > trees: iova->qemu vaddr for iova allocation and the reverse one to
> > > > translate available buffers.
> > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]