qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reducing vdpa migration downtime because of memory pin / maps


From: Jason Wang
Subject: Re: Reducing vdpa migration downtime because of memory pin / maps
Date: Fri, 9 Jun 2023 11:18:18 +0800

On Fri, Jun 9, 2023 at 6:39 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
> On 6/7/23 01:08, Eugenio Perez Martin wrote:
> > On Wed, Jun 7, 2023 at 12:43 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >> Sorry for reviving this old thread, I lost the best timing to follow up
> >> on this while I was on vacation. I have been working on this and found
> >> out some discrepancy, please see below.
> >>
> >> On 4/5/23 04:37, Eugenio Perez Martin wrote:
> >>> Hi!
> >>>
> >>> As mentioned in the last upstream virtio-networking meeting, one of
> >>> the factors that adds more downtime to migration is the handling of
> >>> the guest memory (pin, map, etc). At this moment this handling is
> >>> bound to the virtio life cycle (DRIVER_OK, RESET). In that sense, the
> >>> destination device waits until all the guest memory / state is
> >>> migrated to start pinning all the memory.
> >>>
> >>> The proposal is to bind it to the char device life cycle (open vs
> >>> close),
> >> Hmmm, really? If it's the life cycle for char device, the next guest /
> >> qemu launch on the same vhost-vdpa device node won't make it work.
> >>
> > Maybe my sentence was not accurate, but I think we're on the same page here.
> >
> > Two qemu instances opening the same char device at the same time are
> > not allowed, and vhost_vdpa_release clean all the maps. So the next
> > qemu that opens the char device should see a clean device anyway.
>
> I mean the pin can't be done at the time of char device open, where the
> user address space is not known/bound yet. The earliest point possible
> for pinning would be until the vhost_attach_mm() call from SET_OWNER is
> done. Actually I think the counterpart vhost_detach_mm() only gets
> handled in vhost_vdpa_release() at device close time is a resulting
> artifact and amiss of today's vhost protocol - the opposite RESET_OWNER
> call is not made mandatory hence only seen implemented in vhost-net
> device today. One qemu instance could well exec(3) another new qemu
> instance to live upgrade itself while keeping all emulated devices and
> guest alive. The current vhost design simply prohibits this from happening.

I'm not sure I fully understand the issue you mention here. What is
missed and can iommufd help anyhow?

>
>
> >
> >>>    so all the guest memory can be pinned for all the guest / qemu
> >>> lifecycle.
> >> I think to tie pinning to guest / qemu process life cycle makes more
> >> sense. Essentially this pinning part needs to be decoupled from the
> >> iotlb mapping abstraction layer, and can / should work as a standalone
> >> uAPI. Such that QEMU at the destination may launch and pin all guest's
> >> memory as needed without having to start the device, while awaiting any
> >> incoming migration request. Though problem is, there's no existing vhost
> >> uAPI that could properly serve as the vehicle for that. SET_OWNER /
> >> SET_MEM_TABLE / RESET_OWNER seems a remote fit.. Any objection against
> >> introducing a new but clean vhost uAPI for pinning guest pages, subject
> >> to guest's life cycle?
> >>
> > I think that to pin or not pin memory maps should be a kernel
> > decision, not to be driven by qemu.
>
> It's kernel decision for sure. I am with this part.
>
> > I'm not against it if needed, but
> > let me know if the current "clean at close" address your concerns.
>
> To better facilitate QEMU exec (live update) case, I propose we add new
> vhost uAPI pair for explicit pinning request - which would live with
> user mm's, or more precisely qemu instance's lifecycle.

Is this something similar to iommufd?

Btw, I'm not sure whether or not it's good to expose pinning to the
userspace. For example, vhost allows virtual mapping instead of dma
mapping which doesn't require pinning at all.

Thanks

>
> >
> >> Another concern is the use_va stuff, originally it tags to the device
> >> level and is made static at the time of device instantiation, which is
> >> fine. But others to come just find a new home at per-group level or
> >> per-vq level struct. Hard to tell whether or not pinning is actually
> >> needed for the latter use_va friends, as they are essentially tied to
> >> the virtio life cycle or feature negotiation. While guest / Qemu starts
> >> way earlier than that. Perhaps just ignore those sub-device level use_va
> >> usages? Presumably !use_va at the device level is sufficient to infer
> >> the need of pinning for device?
> >>
> > I don't follow this. But I have the feeling that the subject of my
> > original mail is way more accurate if I would have said just "memory
> > maps".
>
> I think the iotlb layer in vhost-vdpa just provides the abstraction for
> mapping, not pinning. Although in some case mapping implicitly relies on
> pinning for DMA purpose, it doesn't have to tie to that in uAPI
> semantics. We can do explicit on-demand pinning for cases for e.g.
> warming up device at live migration destination.
>
>
> >
> > I still consider the way to fix it is to actually delegate that to the
> > kernel vdpa, so it can choose if a particular ASID needs the pin or
> > not. But let me know if I missed something.
>
> You can disregard this for now. I will discuss that further with you
> guys while bind_mm and per-group use_va stuffs are landed.
>
> Thanks!
> -Siwei
>
>
>
> >
> > Thanks!
> >
> >> Regards,
> >> -Siwei
> >>
> >>
> >>> This has two main problems:
> >>> * At this moment the reset semantics forces the vdpa device to unmap
> >>> all the memory. So this change needs a vhost vdpa feature flag.
> >>> * This may increase the initialization time. Maybe we can delay it if
> >>> qemu is not the destination of a LM. Anyway I think this should be
> >>> done as an optimization on top.
> >>>
> >>> Any ideas or comments in this regard?
> >>>
> >>> Thanks!
> >>>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]