qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expan


From: Auger Eric
Subject: Re: [Qemu-arm] [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
Date: Tue, 26 Feb 2019 14:11:58 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

Hi Igor,

On 2/26/19 9:40 AM, Auger Eric wrote:
> Hi Igor,
> 
> On 2/25/19 10:42 AM, Igor Mammedov wrote:
>> On Fri, 22 Feb 2019 18:35:26 +0100
>> Auger Eric <address@hidden> wrote:
>>
>>> Hi Igor,
>>>
>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>> Eric Auger <address@hidden> wrote:
>>>>
>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>>
>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>>> the max IPA capacity KVM was able to manage.
>>>>>
>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>>
>>>>> In this series we use this feature and allow the RAM to grow without
>>>>> any other limit than the one put by the host kernel.
>>>>>
>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>>> depending on the instantiated memory objects.
>>>>>
>>>>> IO regions previously located between 256GB and 1TB are moved after
>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>>> and maxram_size. Size alignment is enforced.
>>>>>
>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>> onwards.
>>>>>
>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>> that job at the moment.
>>>>>
>>>>> Device memory being put just after the initial RAM, it is possible
>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>
>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>
>>>>> Functionally, the series is split into 3 parts:
>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>    the memory map
>>>>
>>>>> 2) Support of PC-DIMM [10 - 13]
>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>> visible to the guest. It might be that DT is masking problem
>>>> but well, that won't work on ACPI only guests.
>>>
>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>>> added with the DIMM slots.
>> Question is how does it get there? Does it come from DT or from firmware
>> via UEFI interfaces?
>>
>>> So it looks fine to me. Isn't E820 a pure x86 matter?
>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
>> memory early at boot and later put that memory into zone normal and hence
>> make it non-hot-un-pluggable. The same concerns apply to DT based means
>> of discovery.
>> (That's guest issue but it's easy to workaround it not putting hotpluggable
>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
>> That way memory doesn't get (ab)used by firmware or early boot kernel stages
>> and doesn't get locked up.
>>
>>> What else would you expect in the dsdt?
>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
>> describing memory ranges
> 
> OK thank you for the explanations. I will work on PNP0C80 addition then.
> Does it mean that in ACPI mode we must not output DT hotplug memory
> nodes or assuming that PNP0C80 is properly described, it will "override"
> DT description?

After further investigations, I think the pieces you pointed out are
added by Shameer's series, ie. through the build_memory_hotplug_aml()
call. So I suggest we separate the concerns: this series brings support
for DIMM coldplug. hotplug, including all the relevant ACPI structures
will be added later on by Shameer.

Thanks

Eric
> 
>>
>>> I understand hotplug
>>> would require extra modifications but I don't see anything else missing
>>> for coldplug.
>>>> Even though I've tried make mem hotplug ACPI parts not x86 specific,
>>>> I'm afraid it might be tightly coupled with hotplug support.
>>>> So here are 2 options make DSDT part work without hotplug or
>>>> implement hotplug here. I think the former is just a waste of time
>>>> and we should just add hotplug. It should take relatively minor effort
>>>> since you already implemented most of boiler plate here.
>>>
>>> Shameer sent an RFC series for supporting hotplug.
>>>
>>> [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
>>> https://patchwork.kernel.org/cover/10783589/
>>>
>>> I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
>>> OK, even after system_reset.
>>>
>>> Note the hotplug kernel support on ARM is very recent. I would prefer to
>>> dissociate both efforts if we want to get a chance making coldplug for
>>> 4.0. Also we have an issue for NVDIMM since on reboot the guest does not
>>> boot properly.
>> I guess we can merge implemetation that works on some kernel configs
>> [DT based I'd guess], and add ACPI part later. Though that will be
>> a bit of a mess as we do not version firmware parts (ACPI tables).
>>
>>>> As for how to implement ACPI HW part, I suggest to borrow GED
>>>> device that NEMU guys trying to use instead of GPIO route,
>>>> like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
>>>> So that it would be easier to share this with their virt-x86
>>>> machine eventually.
>>> Sounds like a different approach than the one initiated by Shameer?
>> ARM boards were first to use ACPI hw-reduced profile so they picked up
>> available back then GPIO based way to deliver hotplug event, later spec
>> introduced Generic Event Device for that means to use with hw-reduced
>> profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
>> GPIO mapping. I'd guess it will more compatible with various contemporary
>> guests and we could reuse the same code for both x86/arm virt boards)
>>
>> 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c
> 
> That's really helpful for the ARM hotplug works. Thanks!
> 
> Eric
>>
>>>
>>> Thanks
>>>
>>> Eric
>>>>
>>>>
>>>>> 3) Support of NV-DIMM [14 - 17]
>>>> The same might be true for NUMA but I haven't dug this deep in to
>>>> that part.
>>>>
>>>>>
>>>>> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
>>>>>
>>>>> Work is ongoing to transform the whole memory as device memory.
>>>>> However this move is not trivial and to me, is independent on
>>>>> the improvements brought by this series:
>>>>> - if we were to use DIMM for initial RAM, those DIMMs would use
>>>>>   use slots. Although they would not be part of the ones provided
>>>>>   using the ",slots" options, they are ACPI limited resources.
>>>>> - DT and ACPI description needs to be reworked
>>>>> - NUMA integration needs special care
>>>>> - a special device memory object may be required to avoid consuming
>>>>>   slots and easing the FW description.
>>>>>
>>>>> So I preferred to separate the concerns. This new implementation
>>>>> based on device memory could be candidate for another virt
>>>>> version.
>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eric
>>>>>
>>>>> References:
>>>>>
>>>>> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>>>> http://patchwork.ozlabs.org/cover/914694/
>>>>>
>>>>> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>>>>
>>>>> This series can be found at:
>>>>> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
>>>>>
>>>>> History:
>>>>>
>>>>> v6 -> v7:
>>>>> - Addressed Peter and Igor comments (exceptions sent my email)
>>>>> - Fixed TCG case. Now device memory works also for TCG and vcpu
>>>>>   pamax is checked
>>>>> - See individual logs for more details
>>>>>
>>>>> v5 -> v6:
>>>>> - mingw compilation issue fix
>>>>> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>>>>>   IPA bits
>>>>> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>>>>>   of "hw/arm/virt: Split the memory map description"
>>>>> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>>>>>   squashed into the previous patch
>>>>> - change alignment of IO regions beyond the RAM so that it matches their
>>>>>   size
>>>>>
>>>>> v4 -> v5:
>>>>> - change in the memory map
>>>>> - see individual logs
>>>>>
>>>>> v3 -> v4:
>>>>> - rebase on David's "pc-dimm: next bunch of cleanups" and
>>>>>   "pc-dimm: pre_plug "slot" and "addr" assignment"
>>>>> - kvm-type option not used anymore. We directly use
>>>>>   maxram_size and ram_size machine fields to compute the
>>>>>   MAX IPA range. Migration is naturally handled as CLI
>>>>>   option are kept between source and destination. This was
>>>>>   suggested by David.
>>>>> - device_memory_start and device_memory_size not stored
>>>>>   anymore in vms->bootinfo
>>>>> - I did not take into account 2 Igor's comments: the one
>>>>>   related to the refactoring of arm_load_dtb and the one
>>>>>   related to the generation of the dtb after system_reset
>>>>>   which would contain nodes of hotplugged devices (we do
>>>>>   not support hotplug at this stage)
>>>>> - check the end-user does not attempt to hotplug a device
>>>>> - addition of "vl: Set machine ram_size, maxram_size and
>>>>>   ram_slots earlier"
>>>>>
>>>>> v2 -> v3:
>>>>> - fix pc_q35 and pc_piix compilation error
>>>>> - kwangwoo's email being not valid anymore, remove his address
>>>>>
>>>>> v1 -> v2:
>>>>> - kvm_get_max_vm_phys_shift moved in arch specific file
>>>>> - addition of NVDIMM part
>>>>> - single series
>>>>> - rebase on David's refactoring
>>>>>
>>>>> v1:
>>>>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>>>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>> Eric Auger (12):
>>>>>   hw/arm/virt: Rename highmem IO regions
>>>>>   hw/arm/virt: Split the memory map description
>>>>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>>>>   kvm: add kvm_arm_get_max_vm_ipa_size
>>>>>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>>>>>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>>>>>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>>>>>   hw/arm/virt: Bump the 255GB initial RAM limit
>>>>>   hw/arm/virt: Add memory hotplug framework
>>>>>   hw/arm/virt: Allocate device_memory
>>>>>   hw/arm/boot: Expose the pmem nodes in the DT
>>>>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>>>>
>>>>> Kwangwoo Lee (2):
>>>>>   nvdimm: use configurable ACPI IO base and size
>>>>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>>>>
>>>>> Shameer Kolothum (3):
>>>>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>>>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>>>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>>>>
>>>>>  accel/kvm/kvm-all.c             |   2 +-
>>>>>  default-configs/arm-softmmu.mak |   4 +
>>>>>  hw/acpi/nvdimm.c                |  31 ++-
>>>>>  hw/arm/boot.c                   | 136 ++++++++++--
>>>>>  hw/arm/virt-acpi-build.c        |  23 +-
>>>>>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>>>>>  hw/i386/pc_piix.c               |   6 +-
>>>>>  hw/i386/pc_q35.c                |   6 +-
>>>>>  hw/ppc/mac_newworld.c           |   3 +-
>>>>>  hw/ppc/mac_oldworld.c           |   2 +-
>>>>>  hw/ppc/spapr.c                  |   2 +-
>>>>>  include/hw/arm/virt.h           |  24 ++-
>>>>>  include/hw/boards.h             |   5 +-
>>>>>  include/hw/mem/nvdimm.h         |   4 +
>>>>>  target/arm/kvm.c                |  10 +
>>>>>  target/arm/kvm_arm.h            |  13 ++
>>>>>  vl.c                            |   6 +-
>>>>>  17 files changed, 556 insertions(+), 85 deletions(-)
>>>>>
>>>>
>>>>
>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]