[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch
From: |
Marc Zyngier |
Subject: |
Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch |
Date: |
Sun, 14 Jun 2020 12:54:54 +0100 |
User-agent: |
Roundcube Webmail/1.4.4 |
Hi Salil,
On 2020-06-13 22:36, Salil Mehta wrote:
This patch-set introduces the virtual cpu hotplug support for ARMv8
architecture in QEMU. Idea is to be able to hotplug and hot-unplug the
vcpus
while guest VM is running and no reboot is required. This does *not*
makes any
assumption of the physical cpu hotplug availability within the host
system but
rather tries to solve the problem at virtualizer/QEMU layer and by
introducing
cpu hotplug hooks and event handling within the guest kernel. No
changes are
required within the host kernel/KVM.
Motivation:
This allows scaling the guest VM compute capacity on-demand which would
be
useful for the following example scenarios,
1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the
orchestration
framework which could adjust resource requests (CPU and Mem
requests) for
the containers in a pod, based on usage.
2. Pay-as-you-grow Business Model: Infrastructure provider could
allocate and
restrict the total number of compute resources available to the
guest VM
according to the SLA(Service Level Agreement). VM owner could
request for
more compute to be hot-plugged for some cost.
Terminology:
(*) Present cpus: Total cpus with which guest has/will boot and are
available
to guest for use and can be onlined. Qemu
parameter(-smp)
(*) Disabled cpus: Possible cpus which will not be available for the
guest to
use. These can be hotplugged and made present. These
can be
thought of as un-plugged vcpus. These will be included as
part of sizing.
(*) Posssible cpus: Total vcpus which could ever exist in VM. This
includes
booted cpus plus any cpus which could be later
plugged.
- Qemu parameter(-maxcpus)
- Possible vcpus = Present vcpus (+) Disabled vcpus
Limitations of ARMv8 Architecture:
A. Physical Limitation to CPU Hotplug:
1. ARMv8 architecture does not support the concept of the physical cpu
hotplug.
The closest thing which is recomended to achieve the cpu hotplug on
ARM is
to bring down power state of the cpu using PSCI.
It isn't so much that the ARMv8 architecture doesn't support CPU
hotplug. It is that CPU hotplug is largely out of the scope of the ARMv8
architecture, which is a CPU architecture and not a system architecture.
Yes, the lack of a comprehensive system architecture is *very* annoying,
but let's put the blame where it belongs... ;-)
2. Other ARM components like GIC etc. have not been designed to realize
physical cpu hotplug capability as of now.
B. Limitations of GIC to Support Virtual CPU Hotplug:
1. GIC requires various resources(related to GICR/redistributor,
GICC/cpu
interface etc) like memory regions to be fixed at the VM init time
and these
could not be changed later on after VM has inited.
2. Associations between GICC(GIC cpu interface) and vcpu get fixed at
the VM
init time and GIC does not allows to change this association once
GIC has
initialized.
There isn't an association, really. the GIC CPU interface is part of the
CPU itself, and not an external entity. KVM doesn't split the two
either. It is the association between the CPU and its redistributor that
is being done. There is no architectural way to set this up this, so KVM
just statically configures these based on the number of vcpus and the
number/size of redistributor ranges.
C. Known Limitation of the KVM:
1. As of now KVM allows to create VCPUs but does not allows to delete
the
already created vcpus. QEMU already provides an interface to manage
created
vcpus at KVM level and then to re-use them.
2. Inconsistency in interpretation of the MPIDR generated by KVM for
vcpus
vis-a-vis SMT/threads. This does not looks to be compliant to the
MPIDR
format(SMT is present) as mentioned in the ARMv8 spec. (Please
correct my
understanding if I am wrong here?)
I'm unsure of which part of the architecture KVM doesn't follow when
assigning a MPIDR to a vcpu. By having MPIDR_EL1.MT to 0, KVM tells the
guest OS that vcpus having the same Aff3-1 fields are mostly
independent.
This is because:
- we make no placement guarantee whatsoever (this is userspace's
business)
- ARMv8.2 CPUs designed by ARM (as opposed to architecture licensees)
always set this bit to 1 as they can hypothetically be coupled to SMT
CPUs in a big-little system. This makes MT totally meaningless.
- the one SMT implementation available in the wild (ThunderX-2) is
broken enough that you really should consider turning SMT off (see
CAVIUM_TX2_ERRATUM_219)
- KVM actively prevents the sharing of resources such as TLBs across
vcpus
Given the above, I fail to see the point in setting the SMT bit to
anything but 0.
Workaround to the problems mentioned in Section B & C1:
1. We pre-size the GIC with possible vcpus at VM init time
2. Pre-create all possible vcpus at KVM and associate them with GICC
Again, I'm not sure what you mean here. The CPU interface is by
construction associated with the CPU. Do you mean the redistributor?
3. Park the unplugged vcpus (similar to x86)
(*) For all of above please refer to Marc's suggestion here[1]
Overview of the Approach:
At the time of machvirt_init() we pre-create all of the possible ARMCPU
objects along with the corresponding KVM vcpus at the host. Disabled
KVM vcpu
(which are *not* "present" vcpus but are part of "possible" vcpu list)
are
parked at per VM list "kvm_parked_vcpus" after their initialization.
We create the ARMCPU objects(but these are not *realized* in QOM sense)
even
for the disabled vcpus to facilitate the GIC initialization (pre-sized
with
possible vcpus). After Initialization of the machine is complete we
release
the ARMCPU Objects for the disabled vcpus. These ARMCPU object shall be
re-created at the time when vcpu is hot plugged. This new object is
then
re-attached with the earlier parked KVM vcpu which also gets unparked.
The
ARMCPU object gets now "realized" in QEMU, which means creation of the
corresponding threads, pre_plug/plug phases, and event notification to
the
guest using ACPI GED etc. Similarly, hot-unplug leg will lead to the
"unrealization" of the vcpus and will lead to similar ACPI GED events
to the
guest for unplug and cleanup and eventually ARMCPU object shall be
released and
KVM vcpus shall be parked again.
During machine init, ACPI MADT Table is sized with *possible* vcpus
GICC
entries. The unplugged/disabled vcpus are presented as MADT GICC
DISABLED
entries to the guest. This means the guest will have its resources
pre-sized
with possible vcpus(=present+disabled)
Other approaches to deal with ARMCPU object release(after machine
init):
1. The ARMCPU objects for the disabled vcpus are released in context to
the
virt_machine_done() notifier(approach adopted in this patch-set).
2. Defer the release of current ARMCPU object till the new vcpu object
is
hot plugged.
3. Never release and keep on reusing them and release once at VM exit.
This
solves many problems with above 2 approaches but requires change in
the way
qdev_device_add() fetches/creates the ARMCPU object for the new
vcpus being
hotplugged. For the arm cpu hotplug case we need to figure out way
how to
get access to old object and use it to "re-realize" instead of the
new
ARMCPU object.
Concerns/Questions:
1. In ARM arch a cpu is uniquely represented in hierarchy using various
affinity levels which could represent thread, core, cluster,
package. This
is generally represented by a value in MPIDR register as per the
format
mentioned in specification. Now, the way MPIDR value is derived for
vcpus is
done using vcpu-index. The concept of thread is not quite as same
and rather
gets lost in the derivation of MPIDR for vcpus.
I think we disagree on what a thread is when MPIDR_EL1.MT==0.
2. The topology info used to specify the vcpu while hot-plugging might
not
match with the MPIDR value given back by the KVM for the vcpu at the
time of
init. Concept of SMT bit in MPIDR gets lost as per the derivation
being done
in the KVM. Hence, concept of thread-id, core-id, socket-id if used
as a
topology info to derive MPIDR value as per ARM specification will
not match
with MPIDR actually assigned by the KVM?
The architecture doesn't map the affinity levels on any of these
concepts. They merely imply increased independence as you look at higher
affinity levels. I don't see any value in exposing any meaning to them
unless you start moving some notion of placement into the kernel, or
allow userspace to setup MPIDR_EL1 by itself.
Perhaps need to carry forward work of Andrew? please check here[2]
3. Further if this info is supplied to the guest using PPTT(once
introduced in
QEMU) or even derived using MPIDR shall be inconsistent with the
host vcpu.
4. Any possibilities of interrupts(SGI/PPI/LPI/SPI) always remaining in
*pending* state for the cpus which have been hot-unplugged? IMHO it
looks
okay but will need Marc's confirmation on this.
I don't see why interrupts wouldn't be pending if the storage resource
still exists. Given that distributor and redistributors are persistent,
it seems logical that pending bits stick around.
Slightly more selfish question: is there any kernel change associated
with this series? I'd be perfectly happy to hear a "no!"... ;-)
Thanks,
M.
--
Jazz is not dead. It just smells funny...
- [PATCH RFC 15/22] arm/cpuhp: Update ACPI GED framework to support vcpu hotplug, (continued)
- [PATCH RFC 15/22] arm/cpuhp: Update ACPI GED framework to support vcpu hotplug, Salil Mehta, 2020/06/13
- [PATCH RFC 16/22] arm/cpuhp: Add/update basic hot-(un)plug framework, Salil Mehta, 2020/06/13
- [PATCH RFC 17/22] arm/cpuhp: Changes to (un)wire GICC<->VCPU IRQs during hot-(un)plug, Salil Mehta, 2020/06/13
- [PATCH RFC 18/22] arm/cpuhp: Changes to update GIC with vcpu hot-plug notification, Salil Mehta, 2020/06/13
- [PATCH RFC 19/22] arm/cpuhp: Changes required to (re)init the vcpu register info, Salil Mehta, 2020/06/13
- [PATCH RFC 20/22] arm/cpuhp: Update the guest(via GED) about cpu hot-(un)plug events, Salil Mehta, 2020/06/13
- [PATCH RFC 21/22] arm/cpuhp: Changes required for reset and to support next boot, Salil Mehta, 2020/06/13
- [PATCH RFC 22/22] arm/cpuhp: Add support of *unrealize* ARMCPU during vcpu hot-unplug, Salil Mehta, 2020/06/13
- Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch, no-reply, 2020/06/13
- Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch, no-reply, 2020/06/13
- Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch,
Marc Zyngier <=
- Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch, Andrew Jones, 2020/06/23