qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi


From: Greg Kurz
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
Date: Fri, 5 Jul 2019 14:35:57 +0200

On Fri, 5 Jul 2019 16:41:25 +0530
Aravinda Prasad <address@hidden> wrote:

> 
> 
> On Friday 05 July 2019 12:07 AM, Greg Kurz wrote:
> > On Thu, 4 Jul 2019 10:49:05 +0530
> > Aravinda Prasad <address@hidden> wrote:
> > 
> >>
> >>
> >> On Thursday 04 July 2019 06:42 AM, David Gibson wrote:
> >>> On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
> >>>>> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
> >>>>>>
> >>>>>>
> >>>>>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
> >>>>>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
> >>>>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
> >>>>>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
> >>>>>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >>>>>>>> type 4.0.
> >>>>>>>>
> >>>>>>>> The machine check notification address is saved when the
> >>>>>>>> OS issues "ibm,nmi-register" RTAS call.
> >>>>>>>>
> >>>>>>>> This patch also handles the case when multiple processors
> >>>>>>>> experience machine check at or about the same time by
> >>>>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
> >>>>>>>> PAPR, subsequent processors serialize waiting for the first
> >>>>>>>> processor to issue the "ibm,nmi-interlock" call. The second
> >>>>>>>> processor that also received a machine check error waits
> >>>>>>>> till the first processor is done reading the error log.
> >>>>>>>> The first processor issues "ibm,nmi-interlock" call
> >>>>>>>> when the error log is consumed.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Aravinda Prasad <address@hidden>
> >>>>>>>> ---
> >>>>>>>>  hw/ppc/spapr.c         |    6 ++++-
> >>>>>>>>  hw/ppc/spapr_rtas.c    |   63 
> >>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>  include/hw/ppc/spapr.h |    5 +++-
> >>>>>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>>>>> index 3d6d139..213d493 100644
> >>>>>>>> --- a/hw/ppc/spapr.c
> >>>>>>>> +++ b/hw/ppc/spapr.c
> >>>>>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState 
> >>>>>>>> *machine)
> >>>>>>>>          /* Create the error string for live migration blocker */
> >>>>>>>>          error_setg(&spapr->fwnmi_migration_blocker,
> >>>>>>>>                  "Live migration not supported during machine check 
> >>>>>>>> handling");
> >>>>>>>> +
> >>>>>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS 
> >>>>>>>> calls */
> >>>>>>>> +        spapr_fwnmi_register();
> >>>>>>>>      }
> >>>>>>>>  
> >>>>>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>>>>>>> @@ -4408,7 +4411,7 @@ static void 
> >>>>>>>> spapr_machine_class_init(ObjectClass *oc, void *data)
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 
> >>>>>>>> SPAPR_CAP_ON;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>>>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>>>>>>
> >>>>>>> Turning this on by default really isn't ok if it stops you running TCG
> >>>>>>> guests at all.
> >>>>>>
> >>>>>> If so this can be "off" by default until TCG is supported.
> >>>>>>
> >>>>>>>
> >>>>>>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>>>>>      smc->irq = &spapr_irq_dual;
> >>>>>>>>      smc->dr_phb_enabled = true;
> >>>>>>>> @@ -4512,6 +4515,7 @@ static void 
> >>>>>>>> spapr_machine_3_1_class_options(MachineClass *mc)
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 
> >>>>>>>> SPAPR_CAP_OFF;
> >>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>>>
> >>>>>>> We're now well past 4.0, and in fact we're about to go into soft
> >>>>>>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
> >>>>>>> will need to retain the old default.
> >>>>>>
> >>>>>> ok.
> >>>>>>
> >>>>>>>
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >>>>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>>>>>> index a015a80..e010cb2 100644
> >>>>>>>> --- a/hw/ppc/spapr_rtas.c
> >>>>>>>> +++ b/hw/ppc/spapr_rtas.c
> >>>>>>>> @@ -49,6 +49,7 @@
> >>>>>>>>  #include "hw/ppc/fdt.h"
> >>>>>>>>  #include "target/ppc/mmu-hash64.h"
> >>>>>>>>  #include "target/ppc/mmu-book3s-v3.h"
> >>>>>>>> +#include "migration/blocker.h"
> >>>>>>>>  
> >>>>>>>>  static void rtas_display_character(PowerPCCPU *cpu, 
> >>>>>>>> SpaprMachineState *spapr,
> >>>>>>>>                                     uint32_t token, uint32_t nargs,
> >>>>>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU 
> >>>>>>>> *cpu, SpaprMachineState *spapr,
> >>>>>>>>      rtas_st(rets, 1, 100);
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>>>>> +                                  SpaprMachineState *spapr,
> >>>>>>>> +                                  uint32_t token, uint32_t nargs,
> >>>>>>>> +                                  target_ulong args,
> >>>>>>>> +                                  uint32_t nret, target_ulong rets)
> >>>>>>>> +{
> >>>>>>>> +    int ret;
> >>>>>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> >>>>>>>> +
> >>>>>>>> +    if (!rtas_addr) {
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>>>> +        return;
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) 
> >>>>>>>> {
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>>>> +        return;
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
> >>>>>>>> +    if (ret == 1) {
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>>>
> >>>>>>> I don't understand this case separate from the others.  We've already
> >>>>>>> set the cap, so fwnmi support should be checked and available.
> >>>>>>
> >>>>>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 
> >>>>>> if
> >>>>>> cap_ppc_fwnmi is not available in KVM.
> >>>>>
> >>>>> But you've checked for the presence of the extension, yes?  So a
> >>>>> failure to enable the cap would be unexpected.  In which case how does
> >>>>> this case differ from.. 
> >>>>
> >>>> No, this is the function where I check for the presence of the
> >>>> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
> >>>> support is available, but don't take any action if unavailable.
> >>>
> >>> Yeah, that's not ok.  You should be checking for the presence of the
> >>> extension in the .apply() function.  If you start up with the spapr
> >>> cap selected then failing at nmi-register time means something has
> >>> gone badly wrong.
> >>
> >> So, I should check for two things in the .apply() function: first if
> >> cap_ppc_fwnmi is supported and second if cap_ppc_fwnmi is enabled in KVM.
> >>
> >> In that case kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0) should be
> >> called during spapr_machine_init().
> >>
> >> So, we will fail to boot (when SPAPR_CAP_FWNMI_MCE=ON) if cap_ppc_fwnmi
> >> can't be enabled irrespective of whether a guest issues nmi,register or 
> >> not.
> >>
> > 
> > Yes. The idea is that we don't want to expose some feature to the guest
> > if we already know it cannot work. The same stands for migration, if
> > we know the destination host doesn't support the feature, we don't want
> > to confuse the guest because the feature disappeared unexpectedly. The
> > correct way to address that is to check the feature is supported in QEMU
> > and/or KVM before we start the guest or before we accept a migration
> > stream and fail early if we can't provide the feature.
> 
> ok.
> 
> > 
> > Speaking of migration, kvmppc_fwnmi_enable() should be called in a
> > post load callback... and the problem I see is that this is a vCPU
> > ioctl. So either we add a state "I should enable FWNMI on post load"
> > to the vCPU that called nmi,register, or maybe first_cpu can do the
> > trick in a post load callback of vmstate_spapr_machine_check.
> 
> To summarize the changes I am planning:
> 
> If SPAPR_CAP_FWNMI_MCE=ON, then enable cap_ppc_fwnmi in KVM. If
> cap_ppc_fwnmi can't be enabled then fail early. This can be included in
> spapr_machine_init() function.
> 
> During migration, if SPAPR_CAP_FWNMI_MCE=ON then enable cap_ppc_fwnmi by
> having a .post_load callback which calls kvmppc_fwnmi_enable(). Fail the
> migration if we can't enable cap_ppc_fwnmi on the destination host.
> 

LGTM

> > 
> > BTW, since FWNMI is a platform wide feature, ie. you don't need to
> > enable it on a per-CPU basis), why is KVM_CAP_PPC_FWNMI a vCPU ioctl
> > and not a VM ioctl ?
> 
> I think it should be VM ioctl, let me check.
> 

It is indeed described as a VM ioctl in Documentation/virtual/kvm/api.txt
but...

static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
                                     struct kvm_enable_cap *cap)
{
[...]
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
        case KVM_CAP_PPC_FWNMI:
                r = -EINVAL;
                if (!is_kvmppc_hv_enabled(vcpu->kvm))
                        break;
                r = 0;
                vcpu->kvm->arch.fwnmi_enabled = true;
                break;
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
[...]
}

I guess it should have been rather added to kvm_vm_ioctl_enable_cap(),
but since this was released in 4.13, I'm afraid we'll need to support
this variant in QEMU anyway.

> Regards,
> Aravinda
> 
> > 
> >>>
> >>> This is necessary for migration: if you start on a system with nmi
> >>> support and the guest registers for it, you can't then migrate safely
> >>> to a system that doesn't have nmi support.  The way to handle that
> >>> case is to have qemu fail to even start up on a destination without
> >>> the support.
> >>>
> >>>> So this case is when we are running an old version of KVM with no
> >>>> cap_ppc_fwnmi support.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>> +        return;
> >>>>>>>> +    } else if (ret < 0) {
> >>>>>>>> +        error_report("Couldn't enable KVM FWNMI capability");
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >>>>>>>> +        return;
> >>>>>
> >>>>> ..this case.
> >>>>
> >>>> And this is when we have the KVM support but due to some problem with
> >>>> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
> >>>>
> >>>>>
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>>>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>>>>>>> +                                   SpaprMachineState *spapr,
> >>>>>>>> +                                   uint32_t token, uint32_t nargs,
> >>>>>>>> +                                   target_ulong args,
> >>>>>>>> +                                   uint32_t nret, target_ulong rets)
> >>>>>>>> +{
> >>>>>>>> +    if (spapr->guest_machine_check_addr == -1) {
> >>>>>>>> +        /* NMI register not called */
> >>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>>>>>>> +    } else {
> >>>>>>>> +        /*
> >>>>>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI 
> >>>>>>>> handling,
> >>>>>>>> +         * hence unset mc_status.
> >>>>>>>> +         */
> >>>>>>>> +        spapr->mc_status = -1;
> >>>>>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> >>>>>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >>>>>>>
> >>>>>>> Hrm.  We add the blocker at the mce request point.  First, that's in
> >>>>>>> another patch, which isn't great.  Second, does that mean we could add
> >>>>>>> multiple times if we get an MCE on multiple CPUs?  Will that work and
> >>>>>>> correctly match adds and removes properly?
> >>>>>>
> >>>>>> If it is fine to move the migration patch as the last patch in the
> >>>>>> sequence, then we will have add and del blocker in the same patch.
> >>>>>>
> >>>>>> And yes we could add multiple times if we get MCE on multiple CPUs and
> >>>>>> as all those cpus call interlock there should be matching number of
> >>>>>> delete blockers.
> >>>>>
> >>>>> Ok, and I think adding the same pointer to the list multiple times
> >>>>> will work ok.
> >>>>
> >>>> I think so
> >>>>
> >>>>>
> >>>>> Btw, add_blocker() can fail - have you handled failure conditions?
> >>>>
> >>>> yes, I am handling it.
> >>>>
> >>>>>
> >>>>
> >>>
> >>
> > 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]