qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 00/13] Add migration support for VFIO device


From: Kirti Wankhede
Subject: Re: [Qemu-devel] [PATCH v7 00/13] Add migration support for VFIO device
Date: Fri, 19 Jul 2019 00:02:33 +0530

On 7/12/2019 6:02 AM, Yan Zhao wrote:
> On Fri, Jul 12, 2019 at 03:08:31AM +0800, Kirti Wankhede wrote:
>>
>>
>> On 7/11/2019 9:53 PM, Dr. David Alan Gilbert wrote:
>>> * Yan Zhao (address@hidden) wrote:
>>>> On Thu, Jul 11, 2019 at 06:50:12PM +0800, Dr. David Alan Gilbert wrote:
>>>>> * Yan Zhao (address@hidden) wrote:
>>>>>> Hi Kirti,
>>>>>> There are still unaddressed comments to your patches v4.
>>>>>> Would you mind addressing them?
>>>>>>
>>>>>> 1. should we register two migration interfaces simultaneously
>>>>>> (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04750.html)
>>>>>
>>>>> Please don't do this.
>>>>> As far as I'm aware we currently only have one device that does that
>>>>> (vmxnet3) and a patch has just been posted that fixes/removes that.
>>>>>
>>>>> Dave
>>>>>
>>>> hi Dave,
>>>> Thanks for notifying this. but if we want to support postcopy in future,
>>>> after device stops, what interface could we use to transfer data of
>>>> device state only?
>>>> for postcopy, when source device stops, we need to transfer only
>>>> necessary device state to target vm before target vm starts, and we
>>>> don't want to transfer device memory as we'll do that after target vm
>>>> resuming.
>>>
>>> Hmm ok, lets see; that's got to happen in the call to:
>>>     qemu_savevm_state_complete_precopy(fb, false, false);
>>> that's made from postcopy_start.
>>>  (the false's are iterable_only and inactivate_disks)
>>>
>>> and at that time I believe the state is POSTCOPY_ACTIVE, so in_postcopy
>>> is true.
>>>
>>> If you're doing postcopy, then you'll probably define a has_postcopy()
>>> function, so qemu_savevm_state_complete_precopy will skip the
>>> save_live_complete_precopy call from it's loop for at least two of the
>>> reasons in it's big if.
>>>
>>> So you're right; you need the VMSD for this to happen in the second
>>> loop in qemu_savevm_state_complete_precopy.  Hmm.
>>>
>>> Now, what worries me, and I don't know the answer, is how the section
>>> header for the vmstate and the section header for an iteration look
>>> on the stream; how are they different?
>>>
>>
>> I don't have way to test postcopy migration - is one of the major reason
>> I had not included postcopy support in this patchset and clearly called
>> out in cover letter.
>> This patchset is thoroughly tested for precopy migration.
>> If anyone have hardware that supports fault, then I would prefer to add
>> postcopy support as incremental change later which can be tested before
>> submitting.
>>
>> Just a suggestion, instead of using VMSD, is it possible to have some
>> additional check to call save_live_complete_precopy from
>> qemu_savevm_state_complete_precopy?
>>
>>
>>>>
>>>>>> 2. in each save iteration, how much data is to be saved
>>>>>> (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04683.html)
>>
>>> how big is the data_size ?
>>> if this size is too big, it may take too much time and block others.
>>
>> I do had mentioned this in the comment about the structure in vfio.h
>> header. data_size will be provided by vendor driver and obviously will
>> not be greater that migration region size. Vendor driver should be
>> responsible to keep its solution optimized.
>>
> if the data_size is no big than migration region size, and each
> iteration only saves data of data_size, i'm afraid it will cause
> prolonged down time. after all, the migration region size cannot be very
> big.

As I mentioned above, its vendor driver responsibility to keep its
solution optimized.
A good behaving vendor driver should not cause unnecessary prolonged
down time.

> Also, if vendor driver determines how much data to save in each
> iteration alone, and no checks in qemu, it may cause other devices'
> migration time be squeezed.
> 

Sorry, how will that squeeze other device's migration time?

>>
>>>>>> 3. do we need extra interface to get data for device state only
>>>>>> (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04812.html)
>>
>> I don't think so. Opaque Device data from vendor driver can include
>> device state and device memory. Vendor driver who is managing his device
>> can decide how to place data over the stream.
>>
> I know current design is opaque device data. then to support postcopy,
> we may have to add extra device state like in-postcopy. but postcopy is
> more like a qemu state and is not a device state.

One bit from device_state can be used to inform vendor driver about
postcopy, when postcopy support will be added.

> to address it elegantly, may we add an extra interface besides
> vfio_save_buffer() to get data for device state only?
> 

When in postcopy state, based on device_state flag, vendor driver can
copy device state first in migration region, I still think there is no
need to separate device state and device memory.

>>>>>> 4. definition of dirty page copied_pfn
>>>>>> (https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg05592.html)
>>>>>>
>>
>> This was inline to discussion going with Alex. I addressed the concern
>> there. Please check current patchset, which addresses the concerns raised.
>>
> ok. I saw you also updated the flow in the part. please check my comment
> in that patch for detail. but as a suggestion, I think processed_pfns is
> a better name compared to copied_pfns :)
> 

Vendor driver can process total_pfns, but can copy only some pfns bitmap
to migration region. One of the reason could be the size of migration
region is not able to accommodate bitmap of total_pfns size. So it could
be like: 0 < copied_pfns < total_pfns. That's why the name
'copied_pfns'. I'll continue with 'copied_pfns'.

Thanks,
Kirti

>>>>>> Also, I'm glad to see that you updated code by following my comments 
>>>>>> below,
>>>>>> but please don't forget to reply my comments next time:)
>>
>> I tried to reply top of threads and addressed common concerns raised in
>> that. Sorry If I missed any, I'll make sure to point you to my replies
>> going ahead.
>>
> ok. let's cooperate:)
> 
> Thanks
> Yan
> 
>> Thanks,
>> Kirti
>>
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg05357.html
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg06454.html
>>>>>>
>>>>>> Thanks
>>>>>> Yan
>>>>>>
>>>>>> On Tue, Jul 09, 2019 at 05:49:07PM +0800, Kirti Wankhede wrote:
>>>>>>> Add migration support for VFIO device
>>>>>>>
>>>>>>> This Patch set include patches as below:
>>>>>>> - Define KABI for VFIO device for migration support.
>>>>>>> - Added save and restore functions for PCI configuration space
>>>>>>> - Generic migration functionality for VFIO device.
>>>>>>>   * This patch set adds functionality only for PCI devices, but can be
>>>>>>>     extended to other VFIO devices.
>>>>>>>   * Added all the basic functions required for pre-copy, stop-and-copy 
>>>>>>> and
>>>>>>>     resume phases of migration.
>>>>>>>   * Added state change notifier and from that notifier function, VFIO
>>>>>>>     device's state changed is conveyed to VFIO device driver.
>>>>>>>   * During save setup phase and resume/load setup phase, migration 
>>>>>>> region
>>>>>>>     is queried and is used to read/write VFIO device data.
>>>>>>>   * .save_live_pending and .save_live_iterate are implemented to use 
>>>>>>> QEMU's
>>>>>>>     functionality of iteration during pre-copy phase.
>>>>>>>   * In .save_live_complete_precopy, that is in stop-and-copy phase,
>>>>>>>     iteration to read data from VFIO device driver is implemented till 
>>>>>>> pending
>>>>>>>     bytes returned by driver are not zero.
>>>>>>>   * Added function to get dirty pages bitmap for the pages which are 
>>>>>>> used by
>>>>>>>     driver.
>>>>>>> - Add vfio_listerner_log_sync to mark dirty pages.
>>>>>>> - Make VFIO PCI device migration capable. If migration region is not 
>>>>>>> provided by
>>>>>>>   driver, migration is blocked.
>>>>>>>
>>>>>>> Below is the flow of state change for live migration where states in 
>>>>>>> brackets
>>>>>>> represent VM state, migration state and VFIO device state as:
>>>>>>>     (VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE)
>>>>>>>
>>>>>>> Live migration save path:
>>>>>>>         QEMU normal running state
>>>>>>>         (RUNNING, _NONE, _RUNNING)
>>>>>>>                         |
>>>>>>>     migrate_init spawns migration_thread.
>>>>>>>     (RUNNING, _SETUP, _RUNNING|_SAVING)
>>>>>>>     Migration thread then calls each device's .save_setup()
>>>>>>>                         |
>>>>>>>     (RUNNING, _ACTIVE, _RUNNING|_SAVING)
>>>>>>>     If device is active, get pending bytes by .save_live_pending()
>>>>>>>     if pending bytes >= threshold_size,  call save_live_iterate()
>>>>>>>     Data of VFIO device for pre-copy phase is copied.
>>>>>>>     Iterate till pending bytes converge and are less than threshold
>>>>>>>                         |
>>>>>>>     On migration completion, vCPUs stops and calls 
>>>>>>> .save_live_complete_precopy
>>>>>>>     for each active device. VFIO device is then transitioned in
>>>>>>>      _SAVING state.
>>>>>>>     (FINISH_MIGRATE, _DEVICE, _SAVING)
>>>>>>>     For VFIO device, iterate in  .save_live_complete_precopy  until
>>>>>>>     pending data is 0.
>>>>>>>     (FINISH_MIGRATE, _DEVICE, _STOPPED)
>>>>>>>                         |
>>>>>>>     (FINISH_MIGRATE, _COMPLETED, STOPPED)
>>>>>>>     Migraton thread schedule cleanup bottom half and exit
>>>>>>>
>>>>>>> Live migration resume path:
>>>>>>>     Incomming migration calls .load_setup for each device
>>>>>>>     (RESTORE_VM, _ACTIVE, STOPPED)
>>>>>>>                         |
>>>>>>>     For each device, .load_state is called for that device section data
>>>>>>>                         |
>>>>>>>     At the end, called .load_cleanup for each device and vCPUs are 
>>>>>>> started.
>>>>>>>                         |
>>>>>>>         (RUNNING, _NONE, _RUNNING)
>>>>>>>
>>>>>>> Note that:
>>>>>>> - Migration post copy is not supported.
>>>>>>>
>>>>>>> v6 -> v7:
>>>>>>> - Fix build failures.
>>>>>>>
>>>>>>> v5 -> v6:
>>>>>>> - Fix build failure.
>>>>>>>
>>>>>>> v4 -> v5:
>>>>>>> - Added decriptive comment about the sequence of access of members of 
>>>>>>> structure
>>>>>>>   vfio_device_migration_info to be followed based on Alex's suggestion
>>>>>>> - Updated get dirty pages sequence.
>>>>>>> - As per Cornelia Huck's suggestion, added callbacks to VFIODeviceOps to
>>>>>>>   get_object, save_config and load_config.
>>>>>>> - Fixed multiple nit picks.
>>>>>>> - Tested live migration with multiple vfio device assigned to a VM.
>>>>>>>
>>>>>>> v3 -> v4:
>>>>>>> - Added one more bit for _RESUMING flag to be set explicitly.
>>>>>>> - data_offset field is read-only for user space application.
>>>>>>> - data_size is read for every iteration before reading data from 
>>>>>>> migration, that
>>>>>>>   is removed assumption that data will be till end of migration region.
>>>>>>> - If vendor driver supports mappable sparsed region, map those region 
>>>>>>> during
>>>>>>>   setup state of save/load, similarly unmap those from cleanup routines.
>>>>>>> - Handles race condition that causes data corruption in migration 
>>>>>>> region during
>>>>>>>   save device state by adding mutex and serialiaing save_buffer and
>>>>>>>   get_dirty_pages routines.
>>>>>>> - Skip called get_dirty_pages routine for mapped MMIO region of device.
>>>>>>> - Added trace events.
>>>>>>> - Splitted into multiple functional patches.
>>>>>>>
>>>>>>> v2 -> v3:
>>>>>>> - Removed enum of VFIO device states. Defined VFIO device state with 2 
>>>>>>> bits.
>>>>>>> - Re-structured vfio_device_migration_info to keep it minimal and 
>>>>>>> defined action
>>>>>>>   on read and write access on its members.
>>>>>>>
>>>>>>> v1 -> v2:
>>>>>>> - Defined MIGRATION region type and sub-type which should be used with 
>>>>>>> region
>>>>>>>   type capability.
>>>>>>> - Re-structured vfio_device_migration_info. This structure will be 
>>>>>>> placed at 0th
>>>>>>>   offset of migration region.
>>>>>>> - Replaced ioctl with read/write for trapped part of migration region.
>>>>>>> - Added both type of access support, trapped or mmapped, for data 
>>>>>>> section of the
>>>>>>>   region.
>>>>>>> - Moved PCI device functions to pci file.
>>>>>>> - Added iteration to get dirty page bitmap until bitmap for all 
>>>>>>> requested pages
>>>>>>>   are copied.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kirti
>>>>>>>
>>>>>>> Kirti Wankhede (13):
>>>>>>>   vfio: KABI for migration interface
>>>>>>>   vfio: Add function to unmap VFIO region
>>>>>>>   vfio: Add vfio_get_object callback to VFIODeviceOps
>>>>>>>   vfio: Add save and load functions for VFIO PCI devices
>>>>>>>   vfio: Add migration region initialization and finalize function
>>>>>>>   vfio: Add VM state change handler to know state of VM
>>>>>>>   vfio: Add migration state change notifier
>>>>>>>   vfio: Register SaveVMHandlers for VFIO device
>>>>>>>   vfio: Add save state functions to SaveVMHandlers
>>>>>>>   vfio: Add load state functions to SaveVMHandlers
>>>>>>>   vfio: Add function to get dirty page list
>>>>>>>   vfio: Add vfio_listerner_log_sync to mark dirty pages
>>>>>>>   vfio: Make vfio-pci device migration capable.
>>>>>>>
>>>>>>>  hw/vfio/Makefile.objs         |   2 +-
>>>>>>>  hw/vfio/common.c              |  55 +++
>>>>>>>  hw/vfio/migration.c           | 874 
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  hw/vfio/pci.c                 | 137 ++++++-
>>>>>>>  hw/vfio/trace-events          |  19 +
>>>>>>>  include/hw/vfio/vfio-common.h |  25 ++
>>>>>>>  linux-headers/linux/vfio.h    | 166 ++++++++
>>>>>>>  7 files changed, 1271 insertions(+), 7 deletions(-)
>>>>>>>  create mode 100644 hw/vfio/migration.c
>>>>>>>
>>>>>>> -- 
>>>>>>> 2.7.0
>>>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
>>> --
>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
>>>
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]