qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi


From: Greg Kurz
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
Date: Thu, 4 Jul 2019 20:37:35 +0200

On Thu, 4 Jul 2019 10:49:05 +0530
Aravinda Prasad <address@hidden> wrote:

> 
> 
> On Thursday 04 July 2019 06:42 AM, David Gibson wrote:
> > On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
> >>
> >>
> >> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
> >>> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
> >>>>
> >>>>
> >>>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
> >>>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
> >>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
> >>>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
> >>>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
> >>>>>> type 4.0.
> >>>>>>
> >>>>>> The machine check notification address is saved when the
> >>>>>> OS issues "ibm,nmi-register" RTAS call.
> >>>>>>
> >>>>>> This patch also handles the case when multiple processors
> >>>>>> experience machine check at or about the same time by
> >>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
> >>>>>> PAPR, subsequent processors serialize waiting for the first
> >>>>>> processor to issue the "ibm,nmi-interlock" call. The second
> >>>>>> processor that also received a machine check error waits
> >>>>>> till the first processor is done reading the error log.
> >>>>>> The first processor issues "ibm,nmi-interlock" call
> >>>>>> when the error log is consumed.
> >>>>>>
> >>>>>> Signed-off-by: Aravinda Prasad <address@hidden>
> >>>>>> ---
> >>>>>>  hw/ppc/spapr.c         |    6 ++++-
> >>>>>>  hw/ppc/spapr_rtas.c    |   63 
> >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>  include/hw/ppc/spapr.h |    5 +++-
> >>>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >>>>>> index 3d6d139..213d493 100644
> >>>>>> --- a/hw/ppc/spapr.c
> >>>>>> +++ b/hw/ppc/spapr.c
> >>>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState 
> >>>>>> *machine)
> >>>>>>          /* Create the error string for live migration blocker */
> >>>>>>          error_setg(&spapr->fwnmi_migration_blocker,
> >>>>>>                  "Live migration not supported during machine check 
> >>>>>> handling");
> >>>>>> +
> >>>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls 
> >>>>>> */
> >>>>>> +        spapr_fwnmi_register();
> >>>>>>      }
> >>>>>>  
> >>>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> >>>>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass 
> >>>>>> *oc, void *data)
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 
> >>>>>> SPAPR_CAP_ON;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
> >>>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
> >>>>>
> >>>>> Turning this on by default really isn't ok if it stops you running TCG
> >>>>> guests at all.
> >>>>
> >>>> If so this can be "off" by default until TCG is supported.
> >>>>
> >>>>>
> >>>>>>      spapr_caps_add_properties(smc, &error_abort);
> >>>>>>      smc->irq = &spapr_irq_dual;
> >>>>>>      smc->dr_phb_enabled = true;
> >>>>>> @@ -4512,6 +4515,7 @@ static void 
> >>>>>> spapr_machine_3_1_class_options(MachineClass *mc)
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 
> >>>>>> SPAPR_CAP_OFF;
> >>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
> >>>>>
> >>>>> We're now well past 4.0, and in fact we're about to go into soft
> >>>>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
> >>>>> will need to retain the old default.
> >>>>
> >>>> ok.
> >>>>
> >>>>>
> >>>>>>  }
> >>>>>>  
> >>>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> >>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> >>>>>> index a015a80..e010cb2 100644
> >>>>>> --- a/hw/ppc/spapr_rtas.c
> >>>>>> +++ b/hw/ppc/spapr_rtas.c
> >>>>>> @@ -49,6 +49,7 @@
> >>>>>>  #include "hw/ppc/fdt.h"
> >>>>>>  #include "target/ppc/mmu-hash64.h"
> >>>>>>  #include "target/ppc/mmu-book3s-v3.h"
> >>>>>> +#include "migration/blocker.h"
> >>>>>>  
> >>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState 
> >>>>>> *spapr,
> >>>>>>                                     uint32_t token, uint32_t nargs,
> >>>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, 
> >>>>>> SpaprMachineState *spapr,
> >>>>>>      rtas_st(rets, 1, 100);
> >>>>>>  }
> >>>>>>  
> >>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
> >>>>>> +                                  SpaprMachineState *spapr,
> >>>>>> +                                  uint32_t token, uint32_t nargs,
> >>>>>> +                                  target_ulong args,
> >>>>>> +                                  uint32_t nret, target_ulong rets)
> >>>>>> +{
> >>>>>> +    int ret;
> >>>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
> >>>>>> +
> >>>>>> +    if (!rtas_addr) {
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>> +        return;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>> +        return;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
> >>>>>> +    if (ret == 1) {
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> >>>>>
> >>>>> I don't understand this case separate from the others.  We've already
> >>>>> set the cap, so fwnmi support should be checked and available.
> >>>>
> >>>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
> >>>> cap_ppc_fwnmi is not available in KVM.
> >>>
> >>> But you've checked for the presence of the extension, yes?  So a
> >>> failure to enable the cap would be unexpected.  In which case how does
> >>> this case differ from.. 
> >>
> >> No, this is the function where I check for the presence of the
> >> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
> >> support is available, but don't take any action if unavailable.
> > 
> > Yeah, that's not ok.  You should be checking for the presence of the
> > extension in the .apply() function.  If you start up with the spapr
> > cap selected then failing at nmi-register time means something has
> > gone badly wrong.
> 
> So, I should check for two things in the .apply() function: first if
> cap_ppc_fwnmi is supported and second if cap_ppc_fwnmi is enabled in KVM.
> 
> In that case kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0) should be
> called during spapr_machine_init().
> 
> So, we will fail to boot (when SPAPR_CAP_FWNMI_MCE=ON) if cap_ppc_fwnmi
> can't be enabled irrespective of whether a guest issues nmi,register or not.
> 

Yes. The idea is that we don't want to expose some feature to the guest
if we already know it cannot work. The same stands for migration, if
we know the destination host doesn't support the feature, we don't want
to confuse the guest because the feature disappeared unexpectedly. The
correct way to address that is to check the feature is supported in QEMU
and/or KVM before we start the guest or before we accept a migration
stream and fail early if we can't provide the feature.

Speaking of migration, kvmppc_fwnmi_enable() should be called in a
post load callback... and the problem I see is that this is a vCPU
ioctl. So either we add a state "I should enable FWNMI on post load"
to the vCPU that called nmi,register, or maybe first_cpu can do the
trick in a post load callback of vmstate_spapr_machine_check.

BTW, since FWNMI is a platform wide feature, ie. you don't need to
enable it on a per-CPU basis), why is KVM_CAP_PPC_FWNMI a vCPU ioctl
and not a VM ioctl ?

> > 
> > This is necessary for migration: if you start on a system with nmi
> > support and the guest registers for it, you can't then migrate safely
> > to a system that doesn't have nmi support.  The way to handle that
> > case is to have qemu fail to even start up on a destination without
> > the support.
> > 
> >> So this case is when we are running an old version of KVM with no
> >> cap_ppc_fwnmi support.
> >>
> >>>
> >>>>
> >>>>>
> >>>>>> +        return;
> >>>>>> +    } else if (ret < 0) {
> >>>>>> +        error_report("Couldn't enable KVM FWNMI capability");
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> >>>>>> +        return;
> >>>
> >>> ..this case.
> >>
> >> And this is when we have the KVM support but due to some problem with
> >> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
> >>
> >>>
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
> >>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
> >>>>>> +                                   SpaprMachineState *spapr,
> >>>>>> +                                   uint32_t token, uint32_t nargs,
> >>>>>> +                                   target_ulong args,
> >>>>>> +                                   uint32_t nret, target_ulong rets)
> >>>>>> +{
> >>>>>> +    if (spapr->guest_machine_check_addr == -1) {
> >>>>>> +        /* NMI register not called */
> >>>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >>>>>> +    } else {
> >>>>>> +        /*
> >>>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
> >>>>>> +         * hence unset mc_status.
> >>>>>> +         */
> >>>>>> +        spapr->mc_status = -1;
> >>>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
> >>>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
> >>>>>
> >>>>> Hrm.  We add the blocker at the mce request point.  First, that's in
> >>>>> another patch, which isn't great.  Second, does that mean we could add
> >>>>> multiple times if we get an MCE on multiple CPUs?  Will that work and
> >>>>> correctly match adds and removes properly?
> >>>>
> >>>> If it is fine to move the migration patch as the last patch in the
> >>>> sequence, then we will have add and del blocker in the same patch.
> >>>>
> >>>> And yes we could add multiple times if we get MCE on multiple CPUs and
> >>>> as all those cpus call interlock there should be matching number of
> >>>> delete blockers.
> >>>
> >>> Ok, and I think adding the same pointer to the list multiple times
> >>> will work ok.
> >>
> >> I think so
> >>
> >>>
> >>> Btw, add_blocker() can fail - have you handled failure conditions?
> >>
> >> yes, I am handling it.
> >>
> >>>
> >>
> > 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]