qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1] docs/devel: Add VFIO device migration documentation


From: Alex Williamson
Subject: Re: [PATCH v1] docs/devel: Add VFIO device migration documentation
Date: Wed, 4 Nov 2020 05:45:27 -0700

On Wed, 4 Nov 2020 13:25:40 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 11/4/2020 1:57 AM, Alex Williamson wrote:
> > On Wed, 4 Nov 2020 01:18:12 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> On 10/30/2020 12:35 AM, Alex Williamson wrote:  
> >>> On Thu, 29 Oct 2020 23:11:16 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>      
> >>
> >> <snip>
> >>  
> >>>>>> +System memory dirty pages tracking
> >>>>>> +----------------------------------
> >>>>>> +
> >>>>>> +A ``log_sync`` memory listener callback is added to mark system 
> >>>>>> memory pages  
> >>>>>
> >>>>> s/is added to mark/marks those/
> >>>>>         
> >>>>>> +as dirty which are used for DMA by VFIO device. Dirty pages bitmap is 
> >>>>>> queried  
> >>>>>
> >>>>> s/by/by the/
> >>>>> s/Dirty/The dirty/
> >>>>>         
> >>>>>> +per container. All pages pinned by vendor driver through 
> >>>>>> vfio_pin_pages()  
> >>>>>
> >>>>> s/by/by the/
> >>>>>         
> >>>>>> +external API have to be marked as dirty during migration. When there 
> >>>>>> are CPU
> >>>>>> +writes, CPU dirty page tracking can identify dirtied pages, but any 
> >>>>>> page pinned
> >>>>>> +by vendor driver can also be written by device. There is currently no 
> >>>>>> device  
> >>>>>
> >>>>> s/by/by the/ (x2)
> >>>>>         
> >>>>>> +which has hardware support for dirty page tracking. So all pages 
> >>>>>> which are
> >>>>>> +pinned by vendor driver are considered as dirty.
> >>>>>> +Dirty pages are tracked when device is in stop-and-copy phase because 
> >>>>>> if pages
> >>>>>> +are marked dirty during pre-copy phase and content is transfered from 
> >>>>>> source to
> >>>>>> +destination, there is no way to know newly dirtied pages from the 
> >>>>>> point they
> >>>>>> +were copied earlier until device stops. To avoid repeated copy of 
> >>>>>> same content,
> >>>>>> +pinned pages are marked dirty only during stop-and-copy phase.  
> >>>>
> >>>>     
> >>>>> Let me take a quick stab at rewriting this paragraph (not sure if I
> >>>>> understood it correctly):
> >>>>>
> >>>>> "Dirty pages are tracked when the device is in the stop-and-copy phase.
> >>>>> During the pre-copy phase, it is not possible to distinguish a dirty
> >>>>> page that has been transferred from the source to the destination from
> >>>>> newly dirtied pages, which would lead to repeated copying of the same
> >>>>> content. Therefore, pinned pages are only marked dirty during the
> >>>>> stop-and-copy phase." ?
> >>>>>         
> >>>>
> >>>> I think above rephrase only talks about repeated copying in pre-copy
> >>>> phase. Used "copied earlier until device stops" to indicate both
> >>>> pre-copy and stop-and-copy till device stops.  
> >>>
> >>>
> >>> Now I'm confused, I thought we had abandoned the idea that we can only
> >>> report pinned pages during stop-and-copy.  Doesn't the device needs to
> >>> expose its dirty memory footprint during the iterative phase regardless
> >>> of whether that causes repeat copies?  If QEMU iterates and sees that
> >>> all memory is still dirty, it may have transferred more data, but it
> >>> can actually predict if it can achieve its downtime tolerances.  Which
> >>> is more important, less data transfer or predictability?  Thanks,
> >>>      
> >>
> >> Even if QEMU copies and transfers content of all sys mem pages during
> >> pre-copy (worst case with IOMMU backed mdev device when its vendor
> >> driver is not smart to pin pages explicitly and all sys mem pages are
> >> marked dirty), then also its prediction about downtime tolerance will
> >> not be correct, because during stop-and-copy again all pages need to be
> >> copied as device can write to any of those pinned pages.  
> > 
> > I think you're only reiterating my point.  If QEMU copies all of guest
> > memory during the iterative phase and each time it sees that all memory
> > is dirty, such as if CPUs or devices (including assigned devices) are
> > dirtying pages as fast as it copies them (or continuously marks them
> > dirty), then QEMU can predict that downtime will require copying all
> > pages.   
> 
> But as of now there is no way to know if device has dirtied pages during 
> iterative phase.


This claim doesn't make any sense, pinned pages are considered
persistently dirtied, during the iterative phase and while stopped.

 
> > If instead devices don't mark dirty pages until the VM is
> > stopped, then QEMU might iterate through memory copy and predict a short
> > downtime because not much memory is dirty, only to be surprised that
> > all of memory is suddenly dirty.  At that point it's too late, the VM
> > is already stopped, the predicted short downtime takes far longer than
> > expected.  This is exactly why we made the kernel interface mark pinned
> > pages persistently dirty when it was proposed that we only report
> > pinned pages once.  Thanks,
> >   
> 
> Since there is no way to know if device dirtied pages during iterative 
> phase, QEMU should query pinned pages in stop-and-copy phase.


As above, I don't believe this is true.


> Whenever there will be hardware support or some software mechanism to 
> report pages dirtied by device then we will add a capability bit in 
> migration capability and based on that capability bit qemu/user space 
> app should decide to query dirty pages in iterative phase.


Yes, we could advertise support for fine granularity dirty page
tracking, but I completely disagree that we should consider pinned
pages clean until suddenly exposing them as dirty once the VM is
stopped.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]