qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch


From: Andrew Jones
Subject: Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch
Date: Tue, 23 Jun 2020 11:12:02 +0200

On Sat, Jun 13, 2020 at 10:36:07PM +0100, Salil Mehta wrote:
> This patch-set introduces the virtual cpu hotplug support for ARMv8
> architecture in QEMU. Idea is to be able to hotplug and hot-unplug the vcpus
> while guest VM is running and no reboot is required. This does *not* makes any
> assumption of the physical cpu hotplug availability within the host system but
> rather tries to solve the problem at virtualizer/QEMU layer and by introducing
> cpu hotplug hooks and event handling within the guest kernel. No changes are
> required within the host kernel/KVM.
> 
> Motivation:
> This allows scaling the guest VM compute capacity on-demand which would be
> useful for the following example scenarios,
> 1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the orchestration
>    framework which could adjust resource requests (CPU and Mem requests) for
>    the containers in a pod, based on usage.
> 2. Pay-as-you-grow Business Model: Infrastructure provider could allocate and
>    restrict the total number of compute resources available to the guest VM
>    according to the SLA(Service Level Agreement). VM owner could request for
>    more compute to be hot-plugged for some cost.
> 
> Terminology:
> 
> (*) Present cpus: Total cpus with which guest has/will boot and are available
>                   to guest for use and can be onlined. Qemu parameter(-smp)
> (*) Disabled cpus: Possible cpus which will not be available for the guest to
>                    use. These can be hotplugged and made present. These can be
>                  thought of as un-plugged vcpus. These will be included as
>                  part of sizing.
> (*) Posssible cpus: Total vcpus which could ever exist in VM. This includes
>                     booted cpus plus any cpus which could be later plugged.
>                   - Qemu parameter(-maxcpus)
>                   - Possible vcpus = Present vcpus (+) Disabled vcpus
> 
> 
> Limitations of ARMv8 Architecture:
> 
> A. Physical Limitation to CPU Hotplug:
> 1. ARMv8 architecture does not support the concept of the physical cpu 
> hotplug.
>    The closest thing which is recomended to achieve the cpu hotplug on ARM is
>    to bring down power state of the cpu using PSCI.
> 2. Other ARM components like GIC etc. have not been designed to realize
>    physical cpu hotplug capability as of now. 
> 
> B. Limitations of GIC to Support Virtual CPU Hotplug:
> 1. GIC requires various resources(related to GICR/redistributor, GICC/cpu
>    interface etc) like memory regions to be fixed at the VM init time and 
> these
>    could not be changed later on after VM has inited.
> 2. Associations between GICC(GIC cpu interface) and vcpu get fixed at the VM
>    init time and GIC does not allows to change this association once GIC has
>    initialized.
> 
> C. Known Limitation of the KVM:
> 1. As of now KVM allows to create VCPUs but does not allows to delete the
>    already created vcpus. QEMU already provides an interface to manage created
>    vcpus at KVM level and then to re-use them.
> 2. Inconsistency in interpretation of the MPIDR generated by KVM for vcpus
>    vis-a-vis SMT/threads. This does not looks to be compliant to the MPIDR
>    format(SMT is present) as mentioned in the ARMv8 spec. (Please correct my
>    understanding if I am wrong here?)
>    
> 
> Workaround to the problems mentioned in Section B & C1:
> 1. We pre-size the GIC with possible vcpus at VM init time
> 2. Pre-create all possible vcpus at KVM and associate them with GICC 
> 3. Park the unplugged vcpus (similar to x86)
> 
> 
> (*) For all of above please refer to Marc's suggestion here[1]
> 
> 
> Overview of the Approach:
> At the time of machvirt_init() we pre-create all of the possible ARMCPU
> objects along with the corresponding KVM vcpus at the host. Disabled KVM vcpu
> (which are *not* "present" vcpus but are part of "possible" vcpu list) are
> parked at per VM list "kvm_parked_vcpus" after their initialization.
> 
> We create the ARMCPU objects(but these are not *realized* in QOM sense) even
> for the disabled vcpus to facilitate the GIC initialization (pre-sized with
> possible vcpus). After Initialization of the machine is complete we release
> the ARMCPU Objects for the disabled vcpus. These ARMCPU object shall be
> re-created at the time when vcpu is hot plugged. This new object is then
> re-attached with the earlier parked KVM vcpu which also gets unparked. The
> ARMCPU object gets now "realized" in QEMU, which means creation of the
> corresponding threads, pre_plug/plug phases, and event notification to the
> guest using ACPI GED etc. Similarly, hot-unplug leg will lead to the
> "unrealization" of the vcpus and will lead to similar ACPI GED events to the
> guest for unplug and cleanup and eventually ARMCPU object shall be released 
> and
> KVM vcpus shall be parked again.
> 
> During machine init, ACPI MADT Table is sized with *possible* vcpus GICC
> entries. The unplugged/disabled vcpus are presented as MADT GICC DISABLED
> entries to the guest. This means the guest will have its resources pre-sized
> with possible vcpus(=present+disabled)
> 
> Other approaches to deal with ARMCPU object release(after machine init):
> 1. The ARMCPU objects for the disabled vcpus are released in context to the
>    virt_machine_done() notifier(approach adopted in this patch-set). 
> 2. Defer the release of current ARMCPU object till the new vcpu object is
>    hot plugged.
> 3. Never release and keep on reusing them and release once at VM exit. This
>    solves many problems with above 2 approaches but requires change in the way
>    qdev_device_add() fetches/creates the ARMCPU object for the new vcpus being
>    hotplugged. For the arm cpu hotplug case we need to figure out way how to
>    get access to old object and use it to "re-realize" instead of the new
>    ARMCPU object.
> 
> Concerns/Questions:
> 1. In ARM arch a cpu is uniquely represented in hierarchy using various
>    affinity levels which could represent thread, core, cluster, package. This
>    is generally represented by a value in MPIDR register as per the format
>    mentioned in specification. Now, the way MPIDR value is derived for vcpus 
> is
>    done using vcpu-index. The concept of thread is not quite as same and 
> rather
>    gets lost in the derivation of MPIDR for vcpus.
> 2. The topology info used to specify the vcpu while hot-plugging might not
>    match with the MPIDR value given back by the KVM for the vcpu at the time 
> of
>    init. Concept of SMT bit in MPIDR gets lost as per the derivation being 
> done
>    in the KVM. Hence, concept of thread-id, core-id, socket-id if used as a
>    topology info to derive MPIDR value as per ARM specification will not match
>    with MPIDR actually assigned by the KVM? 
>    Perhaps need to carry forward work of Andrew? please check here[2]
> 3. Further if this info is supplied to the guest using PPTT(once introduced in
>    QEMU) or even derived using MPIDR shall be inconsistent with the host 
> vcpu. 
> 4. Any possibilities of interrupts(SGI/PPI/LPI/SPI) always remaining in
>    *pending* state for the cpus which have been hot-unplugged? IMHO it looks
>    okay but will need Marc's confirmation on this. 
> 5. If the ARMCPU object is released after the machine init, UEFI could call
>    back virt_update_table() to re-build the ACPI tables which might need an
>    ARMCPU object. Please check the discussion here[5]
> 
> 
> Commands Used:
> 
> A. Qemu launch commands to init the machine
> 
> $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
> -cpu host -smp cpus=4,maxcpus=6 \
> -m 300M \
> -kernel Image \
> -initrd rootfs.cpio.gz \
> -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
> -nographic \
> -bios  QEMU_EFI.fd \
> 
> B. Hot-(un)plug related commands
> 
> # Hotplug a host vcpu(accel=kvm)
> $ device_add host-arm-cpu,id=core4,core-id=4
> 
> # Hotplug a vcpu(accel=tcg)
> $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
> 
> # Delete the vcpu
> $ device_del core4
> 
> NOTE: I have not tested the current solution with '-device' interface. The use
>       is suggested by Igor here[6]. I will test this in coming times but looks
>       it should work with existing changes. 
> 
> 
> Sample output on guest after boot:
> 
> $ cat /sys/devices/system/cpu/possible
> 0-5
> $ cat /sys/devices/system/cpu/present
> 0-3
> $ cat /sys/devices/system/cpu/online
> 0-1
> $ cat /sys/devices/system/cpu/offline
> 2-5
> 
> 
> Sample output on guest after hotplug of vcpu=4:
> 
> $ cat /sys/devices/system/cpu/possible
> 0-5
> $ cat /sys/devices/system/cpu/present
> 0-4
> $ cat /sys/devices/system/cpu/online
> 0-1,4
> $ cat /sys/devices/system/cpu/offline
> 2-3,5
> 
> Note: vcpu=4 was explicitly 'onlined' after hot-plug
> $ echo 1 > /sys/devices/system/cpu/cpu4/online
> 
> 
> Repository:
>  (*) QEMU changes for vcpu hotplug could be cloned from below site,
>      https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1
> 
>  (*) Guest Kernel changes required to co-work with the QEMU shall be posted 
> soon
>      and repo made available at above site. 
> 
> 
> THINGS TO DO:
>  (*) Migration support 
>  (*) TCG/Emulation support is not proper right now. Works to a certain extent
>      but is not complete. especially the unrealize part in which there is a
>      overflow of tcg contexts. The last is due to the fact tcg maintains a 
>      count on number of context(per thread instance) so as we hotplug the 
> vcpus
>      this counter keeps on incrementing. But during hot-unplug the counter is
>      not decremented.
>  (*) Support of hotplug with NUMA is not proper
>  (*) CPU Topology right now is not specified using thread/core/socket but 
>      rather flatly indexed using core-id. This needs consideration[2].
>  (*) Do we need PPTT Support for to specify right topology info to guest about
>      hot-plugged or unplugged vcpus?
>  (*) Test cases
>  (*) Docs need to be updated.
> 
>

Hi Salil,

I realize this is just a preliminary posting and the approach hasn't been
finalized, but maybe in a future posting we can put a lot of this
information into a doc patch. I think we'll need good documentation for
this feature to ensure we get it right and keep in maintained correctly.

Thanks,
drew 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]