[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH 05/19] ppc/pnv: add XIVE support
From: |
Cédric Le Goater |
Subject: |
Re: [Qemu-ppc] [PATCH 05/19] ppc/pnv: add XIVE support |
Date: |
Thu, 21 Feb 2019 09:32:44 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 |
On 2/21/19 4:13 AM, David Gibson wrote:
> On Tue, Feb 19, 2019 at 08:31:25AM +0100, Cédric Le Goater wrote:
>> On 2/12/19 6:40 AM, David Gibson wrote:
>>> On Mon, Jan 28, 2019 at 10:46:11AM +0100, Cédric Le Goater wrote:
> [snip]
>>>> #endif /* _PPC_PNV_H */
>>>> diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
>>>> index 9961ea3a92cd..8e57c064e661 100644
>>>> --- a/include/hw/ppc/pnv_core.h
>>>> +++ b/include/hw/ppc/pnv_core.h
>>>> @@ -49,6 +49,7 @@ typedef struct PnvCoreClass {
>>>>
>>>> typedef struct PnvCPUState {
>>>> struct ICPState *icp;
>>>> + struct XiveTCTX *tctx;
>>>
>>> Unlike sPAPR, we really do always know in advance the interrupt
>>> controller for a particular machine. I think it makes sense to
>>> further split the POWER8 and POWER9 cases here, so we only track one
>>> for any given setup.
>>
>> So, you would define a :
>>
>> typedef struct Pnv9CPUState {
>> struct XiveTCTX *tctx;
>> } Pnv9CPUState;
>>
>> to be allocated when the core is realized ? and later the routine
>> pnv_chip_power9_intc_create() would assign the ->tctx pointer.
>
> Sounds about right.
>
> [snip]
>>>> + uint32_t nr_ends;
>>>> + XiveENDSource end_source;
>>>> +
>>>> + /* Interrupt controller registers */
>>>> + uint64_t regs[0x300];
>>>> +
>>>> + /* Can be configured by FW */
>>>> + uint32_t tctx_chipid;
>>>> + uint32_t chip_id;
>>>
>>> Can't you derive that since you have a pointer to the owning chip?
>>
>> Not always, there are register fields to purposely override this value.
>> I can improve the current code a little I think.
>
> Ok.
>
> [snip]
>>>> +/*
>>>> + * Virtual structures table (VST)
>>>> + */
>>>> +typedef struct XiveVstInfo {
>>>> + uint32_t type;
>>>> + const char *name;
>>>> + uint32_t size;
>>>> + uint32_t max_blocks;
>>>> +} XiveVstInfo;
>>>> +
>>>> +static const XiveVstInfo vst_infos[] = {
>>>> + [VST_TSEL_IVT] = { VST_TSEL_IVT, "EAT", sizeof(XiveEAS), 16 },
>>>
>>> I don't love explicitly storing the type/index in each record, as well
>>> as it being implicit in the table slot.
>>
>> The 'vst_infos' table decribes the different table types and the 'type'
>> field is used to index the runtime table of VSDs. See
>> pnv_xive_vst_addr()
>
> Yes, I know what it's for but it's still redundant information. You
> could avoid it, for example, by passing around an index instead of a
> pointer to a vst_infos[] slot - then you can look up both vst_infos
> and the other table using that index.
:) This is exactly what the code was doing before and I thought passing
the pointer was cleaner ! No problem. This is minor. I will revert.
> [snip]
>>>> + case CQ_TM1_BAR: /* TM BAR. 4 pages. Map only once */
>>>> + case CQ_TM2_BAR: /* second TM BAR. for hotplug. Not modeled */
>>>> + xive->tm_shift = val & CQ_TM_BAR_64K ? 16 : 12;
>>>> + if (!(val & CQ_TM_BAR_VALID)) {
>>>> + xive->tm_base = 0;
>>>> + if (xive->regs[reg] & CQ_TM_BAR_VALID && xive->chip_id == 0) {
>>>> + memory_region_del_subregion(sysmem, &xive->tm_mmio);
>>>> + }
>>>> + } else {
>>>> + xive->tm_base = val & ~(CQ_TM_BAR_VALID | CQ_TM_BAR_64K);
>>>> + if (!(xive->regs[reg] & CQ_TM_BAR_VALID) && xive->chip_id ==
>>>> 0) {
>>>> + memory_region_add_subregion(sysmem, xive->tm_base,
>>>> + &xive->tm_mmio);
>>>> + }
>>>> + }
>>>> + break;
>>>> +
>>>> + case CQ_PC_BARM:
>>>> + xive->regs[reg] = val;
>>>
>>> As discussed elsewhere, this seems to be a big mix of writing things
>>> directly into regs[reg] and doing other things instead, and you really
>>> want to go one way or the other. I'd suggest dropping xive->regs[]
>>> and instead putting the state you need persistent into its own
>>> variables.
>>
>> I made a big effort to introduce helper routines to avoid storing values
>> that can be calculated under the PnvXive model, as you asked for it.
>> The assignment above is only necessary for the pnv_xive_pc_size() below
>> and I don't know how handle this current case without duplicating the
>> switch statement, which I think is ugly.
>
> I'm not sure quite what you mean about duplicating the case.
>> The point here is that since you're only storing in a couple of the
> switch cases, you can just have explicit data backing just those
> values and write to those in the switch cases instead of having a
> great big regs[] array of which only a few bits are used.
The model uses these registers :
xive->regs[PC_GLOBAL_CONFIG >> 3]
xive->regs[CQ_VC_BARM >> 3]
xive->regs[CQ_PC_BARM >> 3]
xive->regs[CQ_TAR >> 3]
xive->regs[VC_GLOBAL_CONFIG >> 3]
xive->regs[VC_VSD_TABLE_ADDR >> 3]);
xive->regs[CQ_IC_BAR]
xive->regs[CQ_TM1_BAR]
xive->regs[CQ_VC_BAR]
xive->regs[PC_THREAD_EN_REG0 >> 3]
xive->regs[PC_THREAD_EN_REG1 >> 3]
xive->regs[(VC_EQC_CWATCH_DAT0 >> 3) + i];
xive->regs[(PC_VPC_CWATCH_DAT0 >> 3) + i]
xive->regs[VC_EQC_CONFIG];
xive->regs[VC_EQC_SCRUB_TRIG]
xive->regs[PC_AT_KILL];
xive->regs[VC_AT_MACRO_KILL]
xive->regs[(PC_TCTXT_INDIR0 >> 3) + i]
The regs array is useful for the different cache watch but apart
from that, yes, we could use independent fields. I will see how much
energy I have to put into this change.
All the registers change in P10, may be this will be a better time
to share a common PnvXive model and isolate the HW interface of each
processor.
>> So, I will keep the xive->regs[] and make the couple of fixes still needed.
>
> [snip]
>>>> +/*
>>>> + * Virtualization Controller MMIO region containing the IPI and END ESB
>>>> pages
>>>> + */
>>>> +static uint64_t pnv_xive_vc_read(void *opaque, hwaddr offset,
>>>> + unsigned size)
>>>> +{
>>>> + PnvXive *xive = PNV_XIVE(opaque);
>>>> + uint64_t edt_index = offset >> pnv_xive_edt_shift(xive);
>>>> + uint64_t edt_type = 0;
>>>> + uint64_t edt_offset;
>>>> + MemTxResult result;
>>>> + AddressSpace *edt_as = NULL;
>>>> + uint64_t ret = -1;
>>>> +
>>>> + if (edt_index < XIVE_TABLE_EDT_MAX) {
>>>> + edt_type = GETFIELD(CQ_TDR_EDT_TYPE, xive->edt[edt_index]);
>>>> + }
>>>> +
>>>> + switch (edt_type) {
>>>> + case CQ_TDR_EDT_IPI:
>>>> + edt_as = &xive->ipi_as;
>>>> + break;
>>>> + case CQ_TDR_EDT_EQ:
>>>> + edt_as = &xive->end_as;
>>>
>>> I'm not entirely understanding the function of these AddressSpace objectz.
>>
>> The IPI and END ESB pages are exposed in the same VC region. But this VC
>> region is not splitted between the two types with a single offset. It
>> is segmented with the EDT tables which defines the type of each segment.
>>
>> The purpose of these address spaces is to translate the load/stores in
>> the VC region in the equivalent IPI or END region exposing contigously
>> ESB pages of the same type: 'end_edt_mmio' and 'ipi_edt_mmio'.
>>
>> The memory regions of the XiveENDSource and the XiveSource for the IPIs
>> are mapped in 'end_edt_mmio' and 'ipi_edt_mmio'.
>
> Hmm. Ok, I'm not immediately seeing why you can't map the various IPI
> or END blocks directly into address_space memory rather than having
> this extra internal layer of indirection.
I think I see what you mean.
You would get rid of the intermediate MR &xive->ipi_edt_mmio and map
directly &xsrc->esb_mmio into &xive->ipi_mmio
same for the ENDs, map directly &end_xsrc->esb_mmio in &xive->end_mmio
That might be overkill indeed. I will check.
>> (This is changing for P10, should be simpler)
>
> [snip]
>>>> +/*
>>>> + * Maximum number of IRQs and ENDs supported by HW
>>>> + */
>>>> +#define PNV_XIVE_NR_IRQS (PNV9_XIVE_VC_SIZE / (1ull <<
>>>> XIVE_ESB_64K_2PAGE))
>>>> +#define PNV_XIVE_NR_ENDS (PNV9_XIVE_VC_SIZE / (1ull <<
>>>> XIVE_ESB_64K_2PAGE))
>>>> +
>>>> +static void pnv_xive_realize(DeviceState *dev, Error **errp)
>>>> +{
>>>> + PnvXive *xive = PNV_XIVE(dev);
>>>> + XiveSource *xsrc = &xive->source;
>>>> + XiveENDSource *end_xsrc = &xive->end_source;
>>>> + Error *local_err = NULL;
>>>> + Object *obj;
>>>> +
>>>> + obj = object_property_get_link(OBJECT(dev), "chip", &local_err);
>>>> + if (!obj) {
>>>> + error_propagate(errp, local_err);
>>>> + error_prepend(errp, "required link 'chip' not found: ");
>>>> + return;
>>>> + }
>>>> +
>>>> + /* The PnvChip id identifies the XIVE interrupt controller. */
>>>> + xive->chip = PNV_CHIP(obj);
>>>> +
>>>> + /*
>>>> + * Xive Interrupt source and END source object with the maximum
>>>> + * allowed HW configuration. The ESB MMIO regions will be resized
>>>> + * dynamically when the controller is configured by the FW to
>>>> + * limit accesses to resources not provisioned.
>>>> + */
>>>> + object_property_set_int(OBJECT(xsrc), PNV_XIVE_NR_IRQS, "nr-irqs",
>>>> + &error_fatal);
>>>
>>> You have a constant here, but your router object also includes a
>>> nr_irqs field. What's up with that?
>>
>> The XiveSource object of PnvXive is realized for the maximum size allowed
>> by HW because we don't know how much IRQs the FW will provision for.
>>
>> The 'nr_irqs' is the number of IRQs configured by the FW (We changed the
>> name to 'nr_ipis') See routine pnv_xive_vst_set_exclusive()
>
> Ah, ok.
>
I will try to get that done for 4.0.
PSI and LPC P9 models should be less complex to review.
Thanks,
C.