qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/4] hw/arm/virt: Improve address assignment for high memo


From: Gavin Shan
Subject: Re: [PATCH v2 0/4] hw/arm/virt: Improve address assignment for high memory regions
Date: Tue, 6 Sep 2022 12:39:58 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0

Hi Eric,

On 8/24/22 6:06 PM, Eric Auger wrote:
On 8/24/22 05:29, Gavin Shan wrote:
On 8/15/22 4:29 PM, Gavin Shan wrote:
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.
      (1) One specific high memory region is disabled by developer by
      toggling vms->highmem_{redists, ecam, mmio}.
      (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
      'virt-2.12' or ealier than it.
      (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
      on 32-bits system.
      (4) One specific high memory region is disabled when it breaks the
      PA space limit.
      The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions:

PATCH[1] and PATCH[2] are cleanup and preparatory works.
PATCH[3] improves address assignment for these high memory regions
PATCH[4] moves the address assignment logic into standalone helper

History
=======
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
=========
v2:
    * Split the patches for easier review                        (Gavin)
    * Improved changelog                                         (Marc)
    * Use 'bool fits' in virt_set_high_memmap()                  (Eric)

You did not really convince me about migration compat wrt the high MMIO
region. Aren't the PCI BARs saved/restored meaning the device driver is
expecting to find data at the same GPA. But what if your high MMIO
region was relocated in the dest QEMU with a possibly smaller VM IPA?
Don't you have MMIO regions now allocated outside of the dest MMIO
region? How does the PCI host bridge route accesses to those regions?
What do I miss?


[...]

Sorry for the delay because I was offline last week. I was intending
to explain the migration on virtio-net device and spent some time to
go through the code. I found it's still complicated for an example
because PCI and virtio device models are involved. So lets still use
e1000e.c as an example here.

There are lots of registers residing in MMIO region, including MSIx
table. The MSIx table is backed by PCIDevice::msix_table, which is
a buffer. The access to MSIx table is read from or write to the buffer.
The corresponding handler is hw/msix.c::msix_table_mmio_ops. msix_init()
is called by e1000e.c to setup the MSIx table, which is associated with
memory BAR#3. As the registers in MSIx table is totally emulated by
QEMU, the BAR's base address isn't a concern.

  struct PCIDevice {
     :
     uint8_t *msix_table;
     :
     MemoryRegion msix_table_mmio;
     :
  };

  /* @table_bar is registered as memory BAR#3 in e1000e_pci_realize() */
  int msix_init(struct PCIDevice *dev, unsigned short nentries,
                MemoryRegion *table_bar, uint8_t table_bar_nr,
                unsigned table_offset, MemoryRegion *pba_bar,
                uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
                Error **errp)
  {
       :
    memory_region_init_io(&dev->msix_table_mmio, OBJECT(dev), 
&msix_table_mmio_ops, dev,
                          "msix-table", table_size);
    memory_region_add_subregion(table_bar, table_offset, &dev->msix_table_mmio);
       :
  }

As we concerned, the BAR's base addresses for MSIx tables are different on 
source
and destination VMs. It's still not a problem because the registers in MSIx 
table
are migrated, saved on source VM and restored on destination VM one by one. It's
to say, not the whole buffer (PCIDevice::msix_table) is saved and restored at 
once.
Further more, the unique ID string, instead the corresponding BAR's base 
address,
is used to identify the MSIx table. For this particular case, the ID string is
something like "e1000e_dev_id/pci-0000:05:00.0/msix state". With this ID string
is received on the destination VM, the object and PCI device is located and the
forth-coming data is saved to PCIDevice::msix_table.

  static const VMStateDescription e1000e_vmstate = {
    .name = "e1000e",
    .version_id = 1,
    .minimum_version_id = 1,
    .pre_save = e1000e_pre_save,
    .post_load = e1000e_post_load,
    .fields = (VMStateField[]) {
        VMSTATE_PCI_DEVICE(parent_obj, E1000EState),
        VMSTATE_MSIX(parent_obj, E1000EState),
        :
    }
  };

  #define VMSTATE_MSIX_TEST(_field, _state, _test) {                 \
    .name         = (stringify(_field)),                             \
    .size         = sizeof(PCIDevice),                               \
    .vmsd         = &vmstate_msix,                                   \
    .flags        = VMS_STRUCT,                                      \
    .offset       = vmstate_offset_value(_state, _field, PCIDevice), \
    .field_exists = (_test)                                          \
  }

  #define VMSTATE_MSIX(_f, _s)                                       \
      VMSTATE_MSIX_TEST(_f, _s, NULL)


  /* On source VM, PCIDevice::msix_table is transferred to destination VM */
  void msix_save(PCIDevice *dev, QEMUFile *f)
  {
    :
    qemu_put_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE);
    qemu_put_buffer(f, dev->msix_pba, DIV_ROUND_UP(n, 8));
  }

  /* On destination VM, the received data is write to PCIDevice::msix_table */
  void msix_load(PCIDevice *dev, QEMUFile *f)
  {
    :
    qemu_get_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE);
    qemu_get_buffer(f, dev->msix_pba, DIV_ROUND_UP(n, 8));
    :
  }

  static int put_msix_state(QEMUFile *f, void *pv, size_t size,
                            const VMStateField *field, JSONWriter *vmdesc)
  {
    msix_save(pv, f);

    return 0;
  }

  static int get_msix_state(QEMUFile *f, void *pv, size_t size,
                            const VMStateField *field)
  {
    msix_load(pv, f);
    return 0;
  }

  static VMStateInfo vmstate_info_msix = {
    .name = "msix state",
    .get  = get_msix_state,
    .put  = put_msix_state,
  };

  const VMStateDescription vmstate_msix = {
    .name = "msix",
    .fields = (VMStateField[]) {
        {
            .name         = "msix",
            .version_id   = 0,
            .field_exists = NULL,
            .size         = 0,   /* ouch */
            .info         = &vmstate_info_msix,
            .flags        = VMS_SINGLE,
            .offset       = 0,
        },
    }
 };

Thanks,
Gavin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]