qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias supp


From: Peter Xu
Subject: Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias support
Date: Tue, 30 Jul 2019 09:18:47 +0800
User-agent: Mutt/1.11.4 (2019-03-13)

On Mon, Jul 29, 2019 at 01:04:41PM -0600, Alex Williamson wrote:
> On Mon, 29 Jul 2019 16:26:46 +0800
> Peter Xu <address@hidden> wrote:
> 
> > On Fri, Jul 26, 2019 at 06:55:53PM -0600, Alex Williamson wrote:
> > > When we account for DMA aliases in the PCI address space, we can no
> > > longer use a single IVHD entry in the IVRS covering all devices.  We
> > > instead need to walk the PCI bus and create alias ranges when we find
> > > a conventional bus.  These alias ranges cannot overlap with a "Select
> > > All" range (as currently implemented), so we also need to enumerate
> > > each device with IVHD entries.
> > > 
> > > Importantly, the IVHD entries used here include a Device ID, which is
> > > simply the PCI BDF (Bus/Device/Function).  The guest firmware is
> > > responsible for programming bus numbers, so the final revision of this
> > > table depends on the update mechanism (acpi_build_update) to be called
> > > after guest PCI enumeration.  
> > 
> > Ouch... so the ACPI build procedure is after those guest PCI code!
> > Could I ask how do you find this? :) It seems much easier for sure
> > this way...
> 
> I believe this is what MST was referring to with the MCFG update,
> acpi_build() is called from both acpi_setup() and acpi_build_update(),
> the latter is setup in numerous places to be called via a mechanism
> that re-writes the table on certain guest actions.  For instance
> acpi_add_rom_blob() passes this function as a callback which turns into
> a select_cb in fw_cfg, such that (aiui) the tables are updated before
> the guest reads them.  I added some fprintfs in the PCI config write
> path to confirm that the update callback occurs after PCI enumeration
> for both SeaBIOS and OVMF.  Since we seem to have other dependencies on
> this ordering elsewhere, I don't think that the IVRS update is unique
> in this regard.

Agreed.

[...]

> > We've implmented the similar logic for multiple times:
> > 
> >   - When we want to do DMA (pci_requester_id)
> >   - When we want to fetch the DMA address space (the previous patch)
> >   - When we fill in the AMD ACPI table (this patch)
> > 
> > Do you think we can generalize them somehow?  I'm thinking how about
> > we directly fetch RID in the 2nd/3rd use case using pci_requester_id()
> > (which existed already) and simply use it?
> 
> For this patch, I think we could use pci_requester_id() for dev_id_b
> above, but we still need to walk the buses and devices, so it really
> only simplifies the 'if (pci_is_express...' code block above, and
> barely at that since we're at the point in the topology where such a
> decision is made.  For the previous patch, pci_requester_id() returns a
> BDF and that code is executed before bus numbers are programmed.  We
> might still make use of requester_id_cache, but a different interface
> would be necessary.  Also note how pci_req_id_cache_get() assumes we're
> looking for the requester ID as seen from the root bus while
> pci_device_iommu_address_space() is from the bus hosting iommu_fn.
> While these are generally the same in practice, it's not required.  I'd
> therefore tend to leave that to some future consolidation.  I can
> update the comment to include that justification in the previous patch.

Yes, we can work on top in the future if needed.  I see that Michael
already plan to merge this version, then it may not worth a repost for
the comment (unless there will be a repost, we could mark a TODO).

> 
> > > +    /*
> > > +     * A PCI bus walk, for each PCI host bridge, is necessary to create a
> > > +     * complete set of IVHD entries.  Do this into a separate blob so 
> > > that we
> > > +     * can calculate the total IVRS table length here and then append 
> > > the new
> > > +     * blob further below.  Fall back to an entry covering all devices, 
> > > which
> > > +     * is sufficient when no aliases are present.
> > > +     */
> > > +    object_child_foreach_recursive(object_get_root(),
> > > +                                   ivrs_host_bridges, ivhd_blob);
> > > +
> > > +    if (!ivhd_blob->len) {
> > > +        /*
> > > +         *   Type 1 device entry reporting all devices
> > > +         *   These are 4-byte device entries currently reporting the 
> > > range of
> > > +         *   Refer to Spec - Table 95:IVHD Device Entry Type 
> > > Codes(4-byte)
> > > +         */
> > > +        build_append_int_noprefix(ivhd_blob, 0x0000001, 4);
> > > +    }  
> > 
> > Is there a real use case for ivhd_blob->len==0?
> 
> It was mostly paranoia, but I believe it's really only an Intel
> convention that the PCI host bridge appears as a device on the bus.  It
> seems possible that we could have a host bridge with no devices, in
> which case we'd insert this select-all entry to make the IVRS valid.
> Perhaps in combination with AMD exposing their IOMMU as a device on the
> PCI bus this is not really an issue, but it's a trivial safety net.
> Thanks,

That question was only for curiousity.  This code path will only
trigger when AMD vIOMMU is detected so I assume the IOMMU device
should always be there, but of course it won't hurt as a safety net.

Thanks for doing this!

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]