[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 9/9] spapr_pci: Use XICS interrupt allocator
From: |
Alexey Kardashevskiy |
Subject: |
Re: [Qemu-devel] [PATCH v3 9/9] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB |
Date: |
Wed, 25 Jun 2014 10:20:05 +1000 |
User-agent: |
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 |
On 05/30/2014 11:45 PM, Alexander Graf wrote:
>
> On 30.05.14 15:36, Alexey Kardashevskiy wrote:
>> On 05/30/2014 08:08 PM, Alexander Graf wrote:
>>> On 30.05.14 11:34, Alexey Kardashevskiy wrote:
>>>> Currently SPAPR PHB keeps track of all allocated MSI (here and below
>>>> MSI stands for both MSI and MSIX) interrupt because
>>>> XICS used to be unable to reuse interrupts. This is a problem for
>>>> dynamic MSI reconfiguration which happens when guest reloads a driver
>>>> or performs PCI hotplug. Another problem is that the existing
>>>> implementation can enable MSI on 32 devices maximum
>>>> (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.
>>>>
>>>> This makes use of new XICS ability to reuse interrupts.
>>>>
>>>> This reorganizes MSI information storage in sPAPRPHBState. Instead of
>>>> static array of 32 descriptors (one per a PCI function), this patch adds
>>>> a GHashTable when @config_addr is a key and (first_irq, num) pair is
>>>> a value. GHashTable can dynamically grow and shrink so the initial limit
>>>> of 32 devices is gone.
>>>>
>>>> This changes migration stream as @msi_table was a static array while new
>>>> @msi_devs is a dynamic hash table. This adds temporary array which is
>>>> used for migration, it is populated in "spapr_pci"::pre_save() callback
>>>> and expanded into the hash table in post_load() callback. Since
>>>> the destination side does not know the number of MSI-enabled devices
>>>> in advance and cannot pre-allocate the temporary array to receive
>>>> migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
>>>> which allocates the array automatically.
>>>>
>>>> This resets the MSI configuration space when interrupts are released by
>>>> the ibm,change-msi RTAS call.
>>>>
>>>> This fixed traces to be more informative.
>>>>
>>>> This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
>>>> was incorrect by accident. As the internal representation changed,
>>>> thus bumps migration version number.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>> ---
>>>> Changes:
>>>> v3:
>>>> * used GHashTable
>>>> * implemented temporary MSI state array for migration
>>>> ---
>>>> hw/ppc/spapr_pci.c | 195
>>>> +++++++++++++++++++++++++-------------------
>>>> include/hw/pci-host/spapr.h | 21 +++--
>>>> trace-events | 5 +-
>>>> 3 files changed, 129 insertions(+), 92 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>> index a2f9677..48c9aad 100644
>>>> --- a/hw/ppc/spapr_pci.c
>>>> +++ b/hw/ppc/spapr_pci.c
>>>> @@ -220,36 +220,12 @@ static void rtas_write_pci_config(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>> }
>>>> /*
>>>> - * Find an entry with config_addr or returns the empty one if not
>>>> found AND
>>>> - * alloc_new is set.
>>>> - * At the moment the msi_table entries are never released so there is
>>>> - * no point to look till the end of the list if we need to find the free
>>>> entry.
>>>> - */
>>>> -static int spapr_msicfg_find(sPAPRPHBState *phb, uint32_t config_addr,
>>>> - bool alloc_new)
>>>> -{
>>>> - int i;
>>>> -
>>>> - for (i = 0; i < SPAPR_MSIX_MAX_DEVS; ++i) {
>>>> - if (!phb->msi_table[i].nvec) {
>>>> - break;
>>>> - }
>>>> - if (phb->msi_table[i].config_addr == config_addr) {
>>>> - return i;
>>>> - }
>>>> - }
>>>> - if ((i < SPAPR_MSIX_MAX_DEVS) && alloc_new) {
>>>> - trace_spapr_pci_msi("Allocating new MSI config", i, config_addr);
>>>> - return i;
>>>> - }
>>>> -
>>>> - return -1;
>>>> -}
>>>> -
>>>> -/*
>>>> * Set MSI/MSIX message data.
>>>> * This is required for msi_notify()/msix_notify() which
>>>> * will write at the addresses via spapr_msi_write().
>>>> + *
>>>> + * If hwaddr == 0, all entries will have .data == first_irq i.e.
>>>> + * table will be reset.
>>>> */
>>>> static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr, bool msix,
>>>> unsigned first_irq, unsigned req_num)
>>>> @@ -263,9 +239,12 @@ static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr
>>>> addr, bool msix,
>>>> return;
>>>> }
>>>> - for (i = 0; i < req_num; ++i, ++msg.data) {
>>>> + for (i = 0; i < req_num; ++i) {
>>>> msix_set_message(pdev, i, msg);
>>>> trace_spapr_pci_msi_setup(pdev->name, i, msg.address);
>>>> + if (addr) {
>>>> + ++msg.data;
>>>> + }
>>>> }
>>>> }
>>>> @@ -280,9 +259,12 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>> unsigned int req_num = rtas_ld(args, 4); /* 0 == remove all */
>>>> unsigned int seq_num = rtas_ld(args, 5);
>>>> unsigned int ret_intr_type;
>>>> - int ndev, irq, max_irqs = 0;
>>>> + unsigned int irq, max_irqs = 0, num = 0;
>>>> sPAPRPHBState *phb = NULL;
>>>> PCIDevice *pdev = NULL;
>>>> + bool msix = false;
>>>> + spapr_pci_msi *msi;
>>>> + int *config_addr_key;
>>>> switch (func) {
>>>> case RTAS_CHANGE_MSI_FN:
>>>> @@ -310,13 +292,18 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>> /* Releasing MSIs */
>>>> if (!req_num) {
>>>> - ndev = spapr_msicfg_find(phb, config_addr, false);
>>>> - if (ndev < 0) {
>>>> - trace_spapr_pci_msi("MSI has not been enabled", -1,
>>>> config_addr);
>>>> + msi = (spapr_pci_msi *) g_hash_table_lookup(phb->msi,
>>>> &config_addr);
>>>> + if (!msi) {
>>>> + trace_spapr_pci_msi("Releasing wrong config", config_addr);
>>>> rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>> return;
>>>> }
>>>> - trace_spapr_pci_msi("Released MSIs", ndev, config_addr);
>>>> +
>>>> + xics_free(spapr->icp, msi->first_irq, msi->num);
>>>> + spapr_msi_setmsg(pdev, 0, msix, 0, num);
>>>> + g_hash_table_remove(phb->msi, &config_addr);
>>> Are you sure this doesn't have to be the exact same element? That pointer
>>> is to the stack, not to the element.
>>
>> I was not sure so I tested and it deletes element even if the key for
>> g_hash_table_remove() is on stack. I looked at glibc and it should just
>> work as it is now while I am providing a key_equal_func() callback to the
>> map:
>> http://sourcecodebrowser.com/glib2.0/2.12.4/ghash_8c.html#acff7e5fffc2572456a379d3d62d3e6ed
>>
>
> True, works for me.
>
>>
>>
>>
>>>> +
>>>> + trace_spapr_pci_msi("Released MSIs", config_addr);
>>>> rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>> rtas_st(rets, 1, 0);
>>>> return;
>>>> @@ -324,15 +311,6 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>> /* Enabling MSI */
>>>> - /* Find a device number in the map to add or reuse the existing
>>>> one */
>>>> - ndev = spapr_msicfg_find(phb, config_addr, true);
>>>> - if (ndev >= SPAPR_MSIX_MAX_DEVS || ndev < 0) {
>>>> - error_report("No free entry for a new MSI device");
>>>> - rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>> - return;
>>>> - }
>>>> - trace_spapr_pci_msi("Configuring MSI", ndev, config_addr);
>>>> -
>>>> /* Check if the device supports as many IRQs as requested */
>>>> if (ret_intr_type == RTAS_TYPE_MSI) {
>>>> max_irqs = msi_nr_vectors_allocated(pdev);
>>>> @@ -340,48 +318,47 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu,
>>>> sPAPREnvironment *spapr,
>>>> max_irqs = pdev->msix_entries_nr;
>>>> }
>>>> if (!max_irqs) {
>>>> - error_report("Requested interrupt type %d is not enabled for
>>>> device#%d",
>>>> - ret_intr_type, ndev);
>>>> + error_report("Requested interrupt type %d is not enabled for
>>>> device %x",
>>>> + ret_intr_type, config_addr);
>>>> rtas_st(rets, 0, -1); /* Hardware error */
>>>> return;
>>>> }
>>>> /* Correct the number if the guest asked for too many */
>>>> if (req_num > max_irqs) {
>>>> + trace_spapr_pci_msi_retry(config_addr, req_num, max_irqs);
>>>> req_num = max_irqs;
>>>> + irq = 0; /* to avoid misleading trace */
>>>> + goto out;
>>>> }
>>>> - /* Check if there is an old config and MSI number has not
>>>> changed */
>>>> - if (phb->msi_table[ndev].nvec && (req_num !=
>>>> phb->msi_table[ndev].nvec)) {
>>>> - /* Unexpected behaviour */
>>>> - error_report("Cannot reuse MSI config for device#%d", ndev);
>>>> + /* Allocate MSIs */
>>>> + irq = xics_alloc_block(spapr->icp, 0, req_num, false,
>>>> + ret_intr_type == RTAS_TYPE_MSI);
>>>> + if (!irq) {
>>>> + error_report("Cannot allocate MSIs for device %x", config_addr);
>>>> rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>> return;
>>>> }
>>>> - /* There is no cached config, allocate MSIs */
>>>> - if (!phb->msi_table[ndev].nvec) {
>>>> - irq = xics_alloc_block(spapr->icp, 0, req_num, false,
>>>> - ret_intr_type == RTAS_TYPE_MSI);
>>>> - if (irq < 0) {
>>>> - error_report("Cannot allocate MSIs for device#%d", ndev);
>>>> - rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>> - return;
>>>> - }
>>>> - phb->msi_table[ndev].irq = irq;
>>>> - phb->msi_table[ndev].nvec = req_num;
>>>> - phb->msi_table[ndev].config_addr = config_addr;
>>>> - }
>>>> -
>>>> /* Setup MSI/MSIX vectors in the device (via cfgspace or MSIX
>>>> BAR) */
>>>> spapr_msi_setmsg(pdev, spapr->msi_win_addr, ret_intr_type ==
>>>> RTAS_TYPE_MSIX,
>>>> - phb->msi_table[ndev].irq, req_num);
>>>> + irq, req_num);
>>>> + /* Add MSI device to cache */
>>>> + msi = g_new(spapr_pci_msi, 1);
>>>> + msi->first_irq = irq;
>>>> + msi->num = req_num;
>>>> + config_addr_key = g_new(int, 1);
>>> gint?
>> Same thing but yeah, better to stay solid.
>>
>> Is that it or you will have more comments after more careful review? :)
>> Thanks!
>
> I think the patch set is ok as is, but I want to have Juan's / Peter's ack
> on the vmstate patch.
Since Juan is ignoring this, can you please pull 1..7 of this and leave 8-9
for later as they not exactly about XICS anyway? Thanks!
>
>
> Alex
>
--
Alexey
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Qemu-devel] [PATCH v3 9/9] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB,
Alexey Kardashevskiy <=