qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 2/2] hw/pvrdma: add live migration support


From: Yuval Shaia
Subject: Re: [Qemu-devel] [RFC v2 2/2] hw/pvrdma: add live migration support
Date: Mon, 8 Jul 2019 08:13:39 +0300
User-agent: Mutt/1.10.1 (2018-07-13)

On Sat, Jul 06, 2019 at 09:39:40AM +0530, Sukrit Bhatnagar wrote:
> Use VMStateDescription for migrating device state. Currently,

What do you mean by 'Currently'?

> 'vmstate_pvrdma' describes the PCI and MSIX state for pvrdma and
> 'vmstate_pvrdma_dsr_dma' describes a temporary state containing
> some values obtained only after mapping to dsr in the source.
> Since the dsr will not be available on dest until we map to the
> dma address we had on source, these values cannot be migrated
> directly.
> 
> Add PVRDMAMigTmp to store this temporary state which consists of
> dma addresses and ring page information. The 'parent' member is
> used to refer to the device state (PVRDMADev) so that parent PCI
> device object is accessible, which is needed to remap to DSR.
> 
> pvrdma_dsr_dma_pre_save() saves the dsr state into this temporary
> representation and pvrdma_dsr_dma_post_load() loads it back.
> This load function also remaps to the dsr and and calls
> load_dsr() for further map and ring init operations.
> 
> Please note that this call to load_dsr() can be removed from the
> migration flow and included in pvrdma_regs_write() to perform a
> lazy load.

The lazy load was suggested to overcome a potential problem with mapping to
addresses while still in migration process. With that been solved i would
suggest to drop the idea of lazy load.

> As of now, migration will fail if there in an error in load_dsr().
> Also, there might be a considerable amount of pages in the rings,
> which will have dma map operations when the init functions are
> called.
> If this takes noticeable time, it might be better to have lazy
> load instead.

Yeah, make sense but i hope we will not get to this.

> 
> Cc: Marcel Apfelbaum <address@hidden>
> Cc: Yuval Shaia <address@hidden>
> Signed-off-by: Sukrit Bhatnagar <address@hidden>
> ---
>  hw/rdma/vmw/pvrdma_main.c | 87 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
> 
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index 6c90db96f9..4a10bd2fc7 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -28,6 +28,7 @@
>  #include "sysemu/sysemu.h"
>  #include "monitor/monitor.h"
>  #include "hw/rdma/rdma.h"
> +#include "migration/register.h"
>  
>  #include "../rdma_rm.h"
>  #include "../rdma_backend.h"
> @@ -593,6 +594,91 @@ static void pvrdma_shutdown_notifier(Notifier *n, void 
> *opaque)
>      pvrdma_fini(pci_dev);
>  }
>  
> +struct PVRDMAMigTmp {
> +    PVRDMADev *parent;
> +    uint64_t dma;
> +    uint64_t cmd_slot_dma;
> +    uint64_t resp_slot_dma;
> +    uint32_t cq_ring_pages_num_pages;
> +    uint64_t cq_ring_pages_pdir_dma;
> +    uint32_t async_ring_pages_num_pages;
> +    uint64_t async_ring_pages_pdir_dma;
> +};
> +
> +static int pvrdma_dsr_dma_pre_save(void *opaque)
> +{
> +    struct PVRDMAMigTmp *tmp = opaque;

For me tmp sounds like a very bad name, if it is the convention then i can
live with that, anyways suggesting something like mig_tmp_info or something
like that.

> +    DSRInfo *dsr_info = &tmp->parent->dsr_info;

Can you shad some light on how the parent field is initialized with the
pointer to the device object?

> +    struct pvrdma_device_shared_region *dsr = dsr_info->dsr;
> +
> +    tmp->dma = dsr_info->dma;
> +    tmp->cmd_slot_dma = dsr->cmd_slot_dma;
> +    tmp->resp_slot_dma = dsr->resp_slot_dma;
> +    tmp->cq_ring_pages_num_pages = dsr->cq_ring_pages.num_pages;
> +    tmp->cq_ring_pages_pdir_dma = dsr->cq_ring_pages.pdir_dma;
> +    tmp->async_ring_pages_num_pages = dsr->async_ring_pages.num_pages;
> +    tmp->async_ring_pages_pdir_dma = dsr->async_ring_pages.pdir_dma;
> +
> +    return 0;
> +}
> +
> +static int pvrdma_dsr_dma_post_load(void *opaque, int version_id)
> +{
> +    struct PVRDMAMigTmp *tmp = opaque;
> +    PVRDMADev *dev = tmp->parent;
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    DSRInfo *dsr_info = &dev->dsr_info;
> +    struct pvrdma_device_shared_region *dsr;
> +
> +    dsr_info->dma = tmp->dma;
> +    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
> +                                sizeof(struct pvrdma_device_shared_region));
> +    if (!dsr_info->dsr) {
> +        rdma_error_report("Failed to map to DSR");
> +        return -ENOMEM;

Will this cause the VM on source host to continue functioning besides
failing the migration?

> +    }
> +
> +    dsr = dsr_info->dsr;
> +    dsr->cmd_slot_dma = tmp->cmd_slot_dma;
> +    dsr->resp_slot_dma = tmp->resp_slot_dma;
> +    dsr->cq_ring_pages.num_pages = tmp->cq_ring_pages_num_pages;
> +    dsr->cq_ring_pages.pdir_dma = tmp->cq_ring_pages_pdir_dma;
> +    dsr->async_ring_pages.num_pages = tmp->async_ring_pages_num_pages;
> +    dsr->async_ring_pages.pdir_dma = tmp->async_ring_pages_pdir_dma;

I expect the above 6 fields to be already populated with the correct values
as we just map to driver's DSR that should be migrated as part of memory
copy of source to dest host.
Can you verify it and if my assumptions are correct to remove these
assignments (and the corresponding from pre_save)?

> +
> +    return load_dsr(dev);
> +}
> +
> +static const VMStateDescription vmstate_pvrdma_dsr_dma = {
> +    .name = "pvrdma-dsr-dma",
> +    .pre_save = pvrdma_dsr_dma_pre_save,
> +    .post_load = pvrdma_dsr_dma_post_load,

I'm looking for a hook that is triggered just before leaving the source
host so we can do some needed cleanups such as unmapping the DSR, removing
IP addresses from the host etc.

> +    .fields = (VMStateField[]) {
> +            VMSTATE_UINT64(dma, struct PVRDMAMigTmp),
> +            VMSTATE_UINT64(cmd_slot_dma, struct PVRDMAMigTmp),
> +            VMSTATE_UINT64(resp_slot_dma, struct PVRDMAMigTmp),
> +            VMSTATE_UINT32(async_ring_pages_num_pages, struct PVRDMAMigTmp),
> +            VMSTATE_UINT64(async_ring_pages_pdir_dma, struct PVRDMAMigTmp),
> +            VMSTATE_UINT32(cq_ring_pages_num_pages, struct PVRDMAMigTmp),
> +            VMSTATE_UINT64(cq_ring_pages_pdir_dma, struct PVRDMAMigTmp),
> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_pvrdma = {
> +    .name = "pvrdma",

Suggesting to use the already defined constant PVRDMA_HW_NAME.

> +    .version_id = 1,
> +    .minimum_version_id = 1,

Hmmm...interesting, what's the use of these fields?

> +    .fields = (VMStateField[]) {
> +            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
> +            VMSTATE_MSIX(parent_obj, PVRDMADev),
> +            VMSTATE_WITH_TMP(PVRDMADev,
> +                             struct PVRDMAMigTmp,
> +                             vmstate_pvrdma_dsr_dma),
> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>  {
>      int rc = 0;
> @@ -688,6 +774,7 @@ static void pvrdma_class_init(ObjectClass *klass, void 
> *data)
>  
>      dc->desc = "RDMA Device";
>      dc->props = pvrdma_dev_properties;
> +    dc->vmsd = &vmstate_pvrdma;
>      set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);

Besides the above comments looks like a great job, thanks!

>  
>      ir->print_statistics = pvrdma_print_statistics;
> -- 
> 2.21.0
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]