[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH qemu] spapr_pci: Add numa node id
From: |
Michael Roth |
Subject: |
Re: [Qemu-ppc] [PATCH qemu] spapr_pci: Add numa node id |
Date: |
Wed, 14 Sep 2016 07:03:50 -0500 |
User-agent: |
alot/0.3.6 |
Quoting Alexey Kardashevskiy (2016-09-14 04:39:10)
> On 14/09/16 09:29, Michael Roth wrote:
> > Quoting Alexey Kardashevskiy (2016-07-27 03:03:38)
> >> This adds a numa id property to a PHB to allow linking passed PCI device
> >> to CPU/memory. It is up to the management stack to do CPU/memory pinning
> >> to the node with the actual PCI device.
> >
> > It looks like x86 relies on PCIBus->numa_node() method (via
> > pci_bus_numa_node()) to encode similar PCIBus affinities
> > into ACPI tables, and currently exposes it via
> > -device pci-[-express]-expander-bus,numa_node=X.
>
>
>
> Well, until we allow DMA windows per PCI bus (not per PHB as it is now),
> this won't make much sense for us (unless I am missing something here).
I didn't consider that it's not a bus-level setting; I think
you're right that re-using the interface to both store/retrieve doesn't
make much sense in that case.
My thought that was that since pci_bus_numa_node() could in theory come
to be relied upon by common PCI code, that we should use it as well. But
even if it doesn't make sense for us to use it, wouldn't it make sense to
still set PCIBus->numa_node (in addition to the PHB-wide value) in the
off-chance that common code does come to rely on pci_bus_numa_node()?
>
>
> > Maybe we should implement it using this existing
> > PCIBus->numa_node() interface?
> >
> > We'd still have to expose numa_node as a spapr-pci-host-bridge
> > device option though. Not sure if there's a more common way
> > to expose it that might be easier for libvirt to discover. As it
> > stands we'd need to add spapr-pci-host-bridge to a libvirt
> > whitelist that currently only covers pci-expander-bus.
> >
> > Cc'ing Shiva who was looking into the libvirt side.
> >
> > One comment below:
> >
> >>
> >> Signed-off-by: Alexey Kardashevskiy <address@hidden>
> >> ---
> >> hw/ppc/spapr_pci.c | 13 +++++++++++++
> >> include/hw/pci-host/spapr.h | 2 ++
> >> 2 files changed, 15 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> >> index 949c44f..af5394a 100644
> >> --- a/hw/ppc/spapr_pci.c
> >> +++ b/hw/ppc/spapr_pci.c
> >> @@ -47,6 +47,7 @@
> >> #include "sysemu/device_tree.h"
> >> #include "sysemu/kvm.h"
> >> #include "sysemu/hostmem.h"
> >> +#include "sysemu/numa.h"
> >>
> >> #include "hw/vfio/vfio.h"
> >>
> >> @@ -1544,6 +1545,7 @@ static Property spapr_phb_properties[] = {
> >> DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, true),
> >> DEFINE_PROP_UINT64("pgsz", sPAPRPHBState, page_size_mask,
> >> (1ULL << 12) | (1ULL << 16)),
> >> + DEFINE_PROP_UINT32("node", sPAPRPHBState, numa_node, -1),
> >> DEFINE_PROP_END_OF_LIST(),
> >> };
> >>
> >> @@ -1805,6 +1807,11 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >> cpu_to_be32(1),
> >> cpu_to_be32(RTAS_IBM_RESET_PE_DMA_WINDOW)
> >> };
> >> + uint32_t associativity[] = {cpu_to_be32(0x4),
> >> + cpu_to_be32(0x0),
> >> + cpu_to_be32(0x0),
> >> + cpu_to_be32(0x0),
> >> + cpu_to_be32(phb->numa_node)};
> >> sPAPRTCETable *tcet;
> >> PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
> >> sPAPRFDT s_fdt;
> >> @@ -1837,6 +1844,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >> &ddw_extensions, sizeof(ddw_extensions)));
> >> }
> >>
> >> + /* Advertise NUMA via ibm,associativity */
> >> + if (nb_numa_nodes > 1) {
> >> + _FDT(fdt_setprop(fdt, bus_off, "ibm,associativity", associativity,
> >> + sizeof(associativity)));
> >> + }
> >
> > LoPAPR 15.3 seems to suggest that ibm,associativity-reference-points is
> > required alongside ibm,associativity for each DT node it appears in,
> > and since we hardcode "Form 1" affinity it should be done similarly as
> > the entry we place in the top-level DT node.
>
>
> Hm, okay, I'll add it. There is a question to Bharata - why do we have 4s
> in spapr_create_fdt_skel()'s refpoints? Just a random pick?
>
> vec5[5]==0x80 means we are doing "Form1" and these 4s are far/near distances?
>
>
> >> +
> >> /* Build the interrupt-map, this must matches what is done
> >> * in pci_spapr_map_irq
> >> */
> >> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> >> index 5adc603..53c4b2d 100644
> >> --- a/include/hw/pci-host/spapr.h
> >> +++ b/include/hw/pci-host/spapr.h
> >> @@ -75,6 +75,8 @@ struct sPAPRPHBState {
> >> bool ddw_enabled;
> >> uint64_t page_size_mask;
> >> uint64_t dma64_win_addr;
> >> +
> >> + uint32_t numa_node;
> >> };
> >>
> >> #define SPAPR_PCI_MAX_INDEX 255
> >> --
> >> 2.5.0.rc3
> >>
> >
>
>
> --
> Alexey
>