[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias supp
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias support |
Date: |
Tue, 30 Jul 2019 09:18:47 +0800 |
User-agent: |
Mutt/1.11.4 (2019-03-13) |
On Mon, Jul 29, 2019 at 01:04:41PM -0600, Alex Williamson wrote:
> On Mon, 29 Jul 2019 16:26:46 +0800
> Peter Xu <address@hidden> wrote:
>
> > On Fri, Jul 26, 2019 at 06:55:53PM -0600, Alex Williamson wrote:
> > > When we account for DMA aliases in the PCI address space, we can no
> > > longer use a single IVHD entry in the IVRS covering all devices. We
> > > instead need to walk the PCI bus and create alias ranges when we find
> > > a conventional bus. These alias ranges cannot overlap with a "Select
> > > All" range (as currently implemented), so we also need to enumerate
> > > each device with IVHD entries.
> > >
> > > Importantly, the IVHD entries used here include a Device ID, which is
> > > simply the PCI BDF (Bus/Device/Function). The guest firmware is
> > > responsible for programming bus numbers, so the final revision of this
> > > table depends on the update mechanism (acpi_build_update) to be called
> > > after guest PCI enumeration.
> >
> > Ouch... so the ACPI build procedure is after those guest PCI code!
> > Could I ask how do you find this? :) It seems much easier for sure
> > this way...
>
> I believe this is what MST was referring to with the MCFG update,
> acpi_build() is called from both acpi_setup() and acpi_build_update(),
> the latter is setup in numerous places to be called via a mechanism
> that re-writes the table on certain guest actions. For instance
> acpi_add_rom_blob() passes this function as a callback which turns into
> a select_cb in fw_cfg, such that (aiui) the tables are updated before
> the guest reads them. I added some fprintfs in the PCI config write
> path to confirm that the update callback occurs after PCI enumeration
> for both SeaBIOS and OVMF. Since we seem to have other dependencies on
> this ordering elsewhere, I don't think that the IVRS update is unique
> in this regard.
Agreed.
[...]
> > We've implmented the similar logic for multiple times:
> >
> > - When we want to do DMA (pci_requester_id)
> > - When we want to fetch the DMA address space (the previous patch)
> > - When we fill in the AMD ACPI table (this patch)
> >
> > Do you think we can generalize them somehow? I'm thinking how about
> > we directly fetch RID in the 2nd/3rd use case using pci_requester_id()
> > (which existed already) and simply use it?
>
> For this patch, I think we could use pci_requester_id() for dev_id_b
> above, but we still need to walk the buses and devices, so it really
> only simplifies the 'if (pci_is_express...' code block above, and
> barely at that since we're at the point in the topology where such a
> decision is made. For the previous patch, pci_requester_id() returns a
> BDF and that code is executed before bus numbers are programmed. We
> might still make use of requester_id_cache, but a different interface
> would be necessary. Also note how pci_req_id_cache_get() assumes we're
> looking for the requester ID as seen from the root bus while
> pci_device_iommu_address_space() is from the bus hosting iommu_fn.
> While these are generally the same in practice, it's not required. I'd
> therefore tend to leave that to some future consolidation. I can
> update the comment to include that justification in the previous patch.
Yes, we can work on top in the future if needed. I see that Michael
already plan to merge this version, then it may not worth a repost for
the comment (unless there will be a repost, we could mark a TODO).
>
> > > + /*
> > > + * A PCI bus walk, for each PCI host bridge, is necessary to create a
> > > + * complete set of IVHD entries. Do this into a separate blob so
> > > that we
> > > + * can calculate the total IVRS table length here and then append
> > > the new
> > > + * blob further below. Fall back to an entry covering all devices,
> > > which
> > > + * is sufficient when no aliases are present.
> > > + */
> > > + object_child_foreach_recursive(object_get_root(),
> > > + ivrs_host_bridges, ivhd_blob);
> > > +
> > > + if (!ivhd_blob->len) {
> > > + /*
> > > + * Type 1 device entry reporting all devices
> > > + * These are 4-byte device entries currently reporting the
> > > range of
> > > + * Refer to Spec - Table 95:IVHD Device Entry Type
> > > Codes(4-byte)
> > > + */
> > > + build_append_int_noprefix(ivhd_blob, 0x0000001, 4);
> > > + }
> >
> > Is there a real use case for ivhd_blob->len==0?
>
> It was mostly paranoia, but I believe it's really only an Intel
> convention that the PCI host bridge appears as a device on the bus. It
> seems possible that we could have a host bridge with no devices, in
> which case we'd insert this select-all entry to make the IVRS valid.
> Perhaps in combination with AMD exposing their IOMMU as a device on the
> PCI bus this is not really an issue, but it's a trivial safety net.
> Thanks,
That question was only for curiousity. This code path will only
trigger when AMD vIOMMU is detected so I assume the IOMMU device
should always be there, but of course it won't hurt as a safety net.
Thanks for doing this!
--
Peter Xu