[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v5 17/18] vfio-user: register handlers to facilitate migratio
From: |
Jag Raman |
Subject: |
Re: [PATCH v5 17/18] vfio-user: register handlers to facilitate migration |
Date: |
Tue, 1 Feb 2022 03:49:40 +0000 |
> On Jan 28, 2022, at 3:29 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Thu, Jan 27, 2022 at 05:04:26PM +0000, Jag Raman wrote:
>>
>>
>>> On Jan 25, 2022, at 10:48 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>
>>> On Wed, Jan 19, 2022 at 04:42:06PM -0500, Jagannathan Raman wrote:
>>>> + * The client subsequetly asks the remote server for any data that
>>>
>>> subsequently
>>>
>>>> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
>>>> +{
>>>> + VfuObject *o = vfu_get_private(vfu_ctx);
>>>> + VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
>>>> + static int migrated_devs;
>>>> + Error *local_err = NULL;
>>>> + int ret;
>>>> +
>>>> + /**
>>>> + * TODO: move to VFU_MIGR_STATE_RESUME handler. Presently, the
>>>> + * VMSD data from source is not available at RESUME state.
>>>> + * Working on a fix for this.
>>>> + */
>>>> + if (!o->vfu_mig_file) {
>>>> + o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
>>>> + }
>>>> +
>>>> + ret = qemu_remote_loadvm(o->vfu_mig_file);
>>>> + if (ret) {
>>>> + VFU_OBJECT_ERROR(o, "vfu: failed to restore device state");
>>>> + return;
>>>> + }
>>>> +
>>>> + qemu_file_shutdown(o->vfu_mig_file);
>>>> + o->vfu_mig_file = NULL;
>>>> +
>>>> + /* VFU_MIGR_STATE_RUNNING begins here */
>>>> + if (++migrated_devs == k->nr_devs) {
>>>
>>> When is this counter reset so migration can be tried again if it
>>> fails/cancels?
>>
>> Detecting cancellation is a pending item. We will address it in the
>> next rev. Will check with you if we get stuck during the process
>> of implementing it.
>>
>>>
>>>> +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
>>>> + uint64_t size, uint64_t offset)
>>>> +{
>>>> + VfuObject *o = vfu_get_private(vfu_ctx);
>>>> +
>>>> + if (offset > o->vfu_mig_buf_size) {
>>>> + return -1;
>>>> + }
>>>> +
>>>> + if ((offset + size) > o->vfu_mig_buf_size) {
>>>> + warn_report("vfu: buffer overflow - check pending_bytes");
>>>> + size = o->vfu_mig_buf_size - offset;
>>>> + }
>>>> +
>>>> + memcpy(buf, (o->vfu_mig_buf + offset), size);
>>>> +
>>>> + o->vfu_mig_buf_pending -= size;
>>>
>>> This assumes that the caller increments offset by size each time. If
>>> that assumption is okay, then we can just trust offset and don't need to
>>> do arithmetic on vfu_mig_buf_pending. If that assumption is not correct,
>>> then the code needs to be extended to safely update vfu_mig_buf_pending
>>> when offset jumps around arbitrarily between calls.
>>
>> Going by the definition of vfu_migration_callbacks_t in the library, I
>> assumed
>> that read_data advances the offset by size bytes.
>>
>> Will add a comment a comment to explain that.
>>
>>>
>>>> +uint64_t vmstate_vmsd_size(PCIDevice *pci_dev)
>>>> +{
>>>> + DeviceClass *dc = DEVICE_GET_CLASS(DEVICE(pci_dev));
>>>> + const VMStateField *field = NULL;
>>>> + uint64_t size = 0;
>>>> +
>>>> + if (!dc->vmsd) {
>>>> + return 0;
>>>> + }
>>>> +
>>>> + field = dc->vmsd->fields;
>>>> + while (field && field->name) {
>>>> + size += vmstate_size(pci_dev, field);
>>>> + field++;
>>>> + }
>>>> +
>>>> + return size;
>>>> +}
>>>
>>> This function looks incorrect because it ignores subsections as well as
>>> runtime behavior during save(). Although VMStateDescription is partially
>>> declarative, there is still a bunch of imperative code that can write to
>>> the QEMUFile at save() time so there's no way of knowing the size ahead
>>> of time.
>>
>> I see your point, it would be a problem for any field which has the
>> (VMS_BUFFER | VMS_ALLOC) flags set.
>>
>>>
>>> I asked this in a previous revision of this series but I'm not sure if
>>> it was answered: is it really necessary to know the size of the vmstate?
>>> I thought the VFIO migration interface is designed to support
>>> streaming reads/writes. We could choose a fixed size like 64KB and
>>> stream the vmstate in 64KB chunks.
>>
>> The library exposes the migration data to the client as a device BAR with
>> fixed size - the size of which is fixed at boot time, even when using
>> vfu_migration_callbacks_t callbacks.
>>
>> I don’t believe the library supports streaming vmstate/migration-data - see
>> the following comment in migration_region_access() defined in the library:
>>
>> * Does this mean that partial reads are not allowed?
>>
>> Thanos or John,
>>
>> Could you please clarify this?
>>
>> Stefan,
>> We attempted to answer the migration cancellation and vmstate size
>> questions previously also, in the following email:
>>
>> https://lore.kernel.org/all/F48606B1-15A4-4DD2-9D71-2FCAFC0E671F@oracle.com/
>
>> libvfio-user has the vfu_migration_callbacks_t interface that allows the
>> device to save/load more data regardless of the size of the migration
>> region. I don't see the issue here since the region doesn't need to be
>> sized to fit the savevm data?
>
> The answer didn't make sense to me:
>
> "In both scenarios at the server end - whether using the migration BAR or
> using callbacks, the migration data is transported to the other end using
> the BAR. As such we need to specify the BAR’s size during initialization.
>
> In the case of the callbacks, the library translates the BAR access to
> callbacks."
>
> The BAR and the migration region within it need a size but my
> understanding is that VFIO migration is designed to stream the device
> state, allowing it to be broken up into multiple reads/writes with
> knowing the device state's size upfront. Here is the description from
> <linux/vfio.h>:
>
> * The sequence to be followed while in pre-copy state and stop-and-copy state
> * is as follows:
> * a. Read pending_bytes, indicating the start of a new iteration to get
> device
> * data. Repeated read on pending_bytes at this stage should have no side
> * effects.
> * If pending_bytes == 0, the user application should not iterate to get
> data
> * for that device.
> * If pending_bytes > 0, perform the following steps.
> * b. Read data_offset, indicating that the vendor driver should make data
> * available through the data section. The vendor driver should return this
> * read operation only after data is available from (region + data_offset)
> * to (region + data_offset + data_size).
> * c. Read data_size, which is the amount of data in bytes available through
> * the migration region.
> * Read on data_offset and data_size should return the offset and size of
> * the current buffer if the user application reads data_offset and
> * data_size more than once here.
> * d. Read data_size bytes of data from (region + data_offset) from the
> * migration region.
> * e. Process the data.
> * f. Read pending_bytes, which indicates that the data from the previous
> * iteration has been read. If pending_bytes > 0, go to step b.
> *
> * The user application can transition from the _SAVING|_RUNNING
> * (pre-copy state) to the _SAVING (stop-and-copy) state regardless of the
> * number of pending bytes. The user application should iterate in _SAVING
> * (stop-and-copy) until pending_bytes is 0.
>
> This means you can report pending_bytes > 0 until the entire vmstate has
> been read and can pick a fixed chunk size like 64KB for the migration
> region. There's no need to size the migration region to fit the entire
> vmstate.
Thank you for the pointer to generic VFIO migration, Stefan! Makes sense.
So I understand that the VFIO migration region carves out a section to
stream/shuttle device data between the app (QEMU client in this case) and the
driver (QEMU server). This section starts at data_offset within the region and
spans
data_size bytes.
We could change the server to stream the data as outlined above. Do you have a
preference for the section size? Does qemu_target_page_size() work? I just
tested
and am able to stream with a fixed BAR size such as qemu_target_page_size().
Thank you!
--
Jag
>
> Stefan
- Re: [PATCH v5 12/18] vfio-user: run vfio-user context, (continued)
[PATCH v5 18/18] vfio-user: avocado tests for vfio-user, Jagannathan Raman, 2022/01/19
Re: [PATCH v5 00/18] vfio-user server in QEMU, Stefan Hajnoczi, 2022/01/25