[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" an
From: |
Greg Kurz |
Subject: |
Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls |
Date: |
Fri, 5 Jul 2019 14:35:57 +0200 |
On Fri, 5 Jul 2019 16:41:25 +0530
Aravinda Prasad <address@hidden> wrote:
>
>
> On Friday 05 July 2019 12:07 AM, Greg Kurz wrote:
> > On Thu, 4 Jul 2019 10:49:05 +0530
> > Aravinda Prasad <address@hidden> wrote:
> >
> >>
> >>
> >> On Thursday 04 July 2019 06:42 AM, David Gibson wrote:
> >>> On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
> >>>>> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
> >>>>>>
> >>>>>>
> >>>>>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
> >>>>>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
> >>>>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
> >>>>>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
> >>>>>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >>>>>>>> type 4.0.
> >>>>>>>>
> >>>>>>>> The machine check notification address is saved when the
> >>>>>>>> OS issues "ibm,nmi-register" RTAS call.
> >>>>>>>>
> >>>>>>>> This patch also handles the case when multiple processors
> >>>>>>>> experience machine check at or about the same time by
> >>>>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
> >>>>>>>> PAPR, subsequent processors serialize waiting for the first
> >>>>>>>> processor to issue the "ibm,nmi-interlock" call. The second
> >>>>>>>> processor that also received a machine check error waits
> >>>>>>>> till the first processor is done reading the error log.
> >>>>>>>> The first processor issues "ibm,nmi-interlock" call
> >>>>>>>> when the error log is consumed.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Aravinda Prasad <address@hidden>
> >>>>>>>> ---
> >>>>>>>> hw/ppc/spapr.c | 6 ++++-
> >>>>>>>> hw/ppc/spapr_rtas.c | 63
> >>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>> include/hw/ppc/spapr.h | 5 +++-
> >>>>>>>> 3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>>>>> index 3d6d139..213d493 100644
> >>>>>>>> --- a/hw/ppc/spapr.c
> >>>>>>>> +++ b/hw/ppc/spapr.c
> >>>>>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState
> >>>>>>>> *machine)
> >>>>>>>> /* Create the error string for live migration blocker */
> >>>>>>>> error_setg(&spapr->fwnmi_migration_blocker,
> >>>>>>>> "Live migration not supported during machine check
> >>>>>>>> handling");
> >>>>>>>> +
> >>>>>>>> + /* Register ibm,nmi-register and ibm,nmi-interlock RTAS
> >>>>>>>> calls */
> >>>>>>>> + spapr_fwnmi_register();
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>>>>>>> @@ -4408,7 +4411,7 @@ static void
> >>>>>>>> spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>>>>> smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>>>>> smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] =
> >>>>>>>> SPAPR_CAP_ON;
> >>>>>>>> smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>>>>>> - smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>>>> + smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>>>>>>
> >>>>>>> Turning this on by default really isn't ok if it stops you running TCG
> >>>>>>> guests at all.
> >>>>>>
> >>>>>> If so this can be "off" by default until TCG is supported.
> >>>>>>
> >>>>>>>
> >>>>>>>> spapr_caps_add_properties(smc, &error_abort);
> >>>>>>>> smc->irq = &spapr_irq_dual;
> >>>>>>>> smc->dr_phb_enabled = true;
> >>>>>>>> @@ -4512,6 +4515,7 @@ static void
> >>>>>>>> spapr_machine_3_1_class_options(MachineClass *mc)
> >>>>>>>> smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>>>>>>> smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>>>>>>> smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] =
> >>>>>>>> SPAPR_CAP_OFF;
> >>>>>>>> + smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>>>
> >>>>>>> We're now well past 4.0, and in fact we're about to go into soft
> >>>>>>> freeze for 4.1, so we're going to miss that too. So 4.1 and earlier
> >>>>>>> will need to retain the old default.
> >>>>>>
> >>>>>> ok.
> >>>>>>
> >>>>>>>
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >>>>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>>>>>> index a015a80..e010cb2 100644
> >>>>>>>> --- a/hw/ppc/spapr_rtas.c
> >>>>>>>> +++ b/hw/ppc/spapr_rtas.c
> >>>>>>>> @@ -49,6 +49,7 @@
> >>>>>>>> #include "hw/ppc/fdt.h"
> >>>>>>>> #include "target/ppc/mmu-hash64.h"
> >>>>>>>> #include "target/ppc/mmu-book3s-v3.h"
> >>>>>>>> +#include "migration/blocker.h"
> >>>>>>>>
> >>>>>>>> static void rtas_display_character(PowerPCCPU *cpu,
> >>>>>>>> SpaprMachineState *spapr,
> >>>>>>>> uint32_t token, uint32_t nargs,
> >>>>>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU
> >>>>>>>> *cpu, SpaprMachineState *spapr,
> >>>>>>>> rtas_st(rets, 1, 100);
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>>>>> + SpaprMachineState *spapr,
> >>>>>>>> + uint32_t token, uint32_t nargs,
> >>>>>>>> + target_ulong args,
> >>>>>>>> + uint32_t nret, target_ulong rets)
> >>>>>>>> +{
> >>>>>>>> + int ret;
> >>>>>>>> + hwaddr rtas_addr = spapr_get_rtas_addr();
> >>>>>>>> +
> >>>>>>>> + if (!rtas_addr) {
> >>>>>>>> + rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>>>> + return;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF)
> >>>>>>>> {
> >>>>>>>> + rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>>>> + return;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + ret = kvmppc_fwnmi_enable(cpu);
> >>>>>>>> + if (ret == 1) {
> >>>>>>>> + rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>>>
> >>>>>>> I don't understand this case separate from the others. We've already
> >>>>>>> set the cap, so fwnmi support should be checked and available.
> >>>>>>
> >>>>>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1
> >>>>>> if
> >>>>>> cap_ppc_fwnmi is not available in KVM.
> >>>>>
> >>>>> But you've checked for the presence of the extension, yes? So a
> >>>>> failure to enable the cap would be unexpected. In which case how does
> >>>>> this case differ from..
> >>>>
> >>>> No, this is the function where I check for the presence of the
> >>>> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
> >>>> support is available, but don't take any action if unavailable.
> >>>
> >>> Yeah, that's not ok. You should be checking for the presence of the
> >>> extension in the .apply() function. If you start up with the spapr
> >>> cap selected then failing at nmi-register time means something has
> >>> gone badly wrong.
> >>
> >> So, I should check for two things in the .apply() function: first if
> >> cap_ppc_fwnmi is supported and second if cap_ppc_fwnmi is enabled in KVM.
> >>
> >> In that case kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0) should be
> >> called during spapr_machine_init().
> >>
> >> So, we will fail to boot (when SPAPR_CAP_FWNMI_MCE=ON) if cap_ppc_fwnmi
> >> can't be enabled irrespective of whether a guest issues nmi,register or
> >> not.
> >>
> >
> > Yes. The idea is that we don't want to expose some feature to the guest
> > if we already know it cannot work. The same stands for migration, if
> > we know the destination host doesn't support the feature, we don't want
> > to confuse the guest because the feature disappeared unexpectedly. The
> > correct way to address that is to check the feature is supported in QEMU
> > and/or KVM before we start the guest or before we accept a migration
> > stream and fail early if we can't provide the feature.
>
> ok.
>
> >
> > Speaking of migration, kvmppc_fwnmi_enable() should be called in a
> > post load callback... and the problem I see is that this is a vCPU
> > ioctl. So either we add a state "I should enable FWNMI on post load"
> > to the vCPU that called nmi,register, or maybe first_cpu can do the
> > trick in a post load callback of vmstate_spapr_machine_check.
>
> To summarize the changes I am planning:
>
> If SPAPR_CAP_FWNMI_MCE=ON, then enable cap_ppc_fwnmi in KVM. If
> cap_ppc_fwnmi can't be enabled then fail early. This can be included in
> spapr_machine_init() function.
>
> During migration, if SPAPR_CAP_FWNMI_MCE=ON then enable cap_ppc_fwnmi by
> having a .post_load callback which calls kvmppc_fwnmi_enable(). Fail the
> migration if we can't enable cap_ppc_fwnmi on the destination host.
>
LGTM
> >
> > BTW, since FWNMI is a platform wide feature, ie. you don't need to
> > enable it on a per-CPU basis), why is KVM_CAP_PPC_FWNMI a vCPU ioctl
> > and not a VM ioctl ?
>
> I think it should be VM ioctl, let me check.
>
It is indeed described as a VM ioctl in Documentation/virtual/kvm/api.txt
but...
static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
struct kvm_enable_cap *cap)
{
[...]
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_FWNMI:
r = -EINVAL;
if (!is_kvmppc_hv_enabled(vcpu->kvm))
break;
r = 0;
vcpu->kvm->arch.fwnmi_enabled = true;
break;
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
[...]
}
I guess it should have been rather added to kvm_vm_ioctl_enable_cap(),
but since this was released in 4.13, I'm afraid we'll need to support
this variant in QEMU anyway.
> Regards,
> Aravinda
>
> >
> >>>
> >>> This is necessary for migration: if you start on a system with nmi
> >>> support and the guest registers for it, you can't then migrate safely
> >>> to a system that doesn't have nmi support. The way to handle that
> >>> case is to have qemu fail to even start up on a destination without
> >>> the support.
> >>>
> >>>> So this case is when we are running an old version of KVM with no
> >>>> cap_ppc_fwnmi support.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>> + return;
> >>>>>>>> + } else if (ret < 0) {
> >>>>>>>> + error_report("Couldn't enable KVM FWNMI capability");
> >>>>>>>> + rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >>>>>>>> + return;
> >>>>>
> >>>>> ..this case.
> >>>>
> >>>> And this is when we have the KVM support but due to some problem with
> >>>> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
> >>>>
> >>>>>
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>>>>>>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>>>>>>> + SpaprMachineState *spapr,
> >>>>>>>> + uint32_t token, uint32_t nargs,
> >>>>>>>> + target_ulong args,
> >>>>>>>> + uint32_t nret, target_ulong rets)
> >>>>>>>> +{
> >>>>>>>> + if (spapr->guest_machine_check_addr == -1) {
> >>>>>>>> + /* NMI register not called */
> >>>>>>>> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>>>>>>> + } else {
> >>>>>>>> + /*
> >>>>>>>> + * vCPU issuing "ibm,nmi-interlock" is done with NMI
> >>>>>>>> handling,
> >>>>>>>> + * hence unset mc_status.
> >>>>>>>> + */
> >>>>>>>> + spapr->mc_status = -1;
> >>>>>>>> + qemu_cond_signal(&spapr->mc_delivery_cond);
> >>>>>>>> + migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >>>>>>>
> >>>>>>> Hrm. We add the blocker at the mce request point. First, that's in
> >>>>>>> another patch, which isn't great. Second, does that mean we could add
> >>>>>>> multiple times if we get an MCE on multiple CPUs? Will that work and
> >>>>>>> correctly match adds and removes properly?
> >>>>>>
> >>>>>> If it is fine to move the migration patch as the last patch in the
> >>>>>> sequence, then we will have add and del blocker in the same patch.
> >>>>>>
> >>>>>> And yes we could add multiple times if we get MCE on multiple CPUs and
> >>>>>> as all those cpus call interlock there should be matching number of
> >>>>>> delete blockers.
> >>>>>
> >>>>> Ok, and I think adding the same pointer to the list multiple times
> >>>>> will work ok.
> >>>>
> >>>> I think so
> >>>>
> >>>>>
> >>>>> Btw, add_blocker() can fail - have you handled failure conditions?
> >>>>
> >>>> yes, I am handling it.
> >>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls, David Gibson, 2019/07/02
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls, Aravinda Prasad, 2019/07/02
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls, David Gibson, 2019/07/03
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls, Aravinda Prasad, 2019/07/03
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls, David Gibson, 2019/07/03
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls, Aravinda Prasad, 2019/07/04
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls, Greg Kurz, 2019/07/04
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls, Aravinda Prasad, 2019/07/05
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls,
Greg Kurz <=
- Re: [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls, David Gibson, 2019/07/09