[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page
From: |
Duan, Zhenzhong |
Subject: |
RE: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty page sync |
Date: |
Wed, 7 Jun 2023 03:14:07 +0000 |
>-----Original Message-----
>From: Peter Xu <peterx@redhat.com>
>Sent: Tuesday, June 6, 2023 11:42 PM
>Subject: Re: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty
>page sync
>
...
>> >> a/include/exec/memory.h b/include/exec/memory.h index
>> >> c3661b2276c7..eecc3eec6702 100644
>> >> --- a/include/exec/memory.h
>> >> +++ b/include/exec/memory.h
>> >> @@ -142,6 +142,10 @@ struct IOMMUTLBEntry {
>> >> * events (e.g. VFIO). Both notifications must be accurate so that
>> >> * the shadow page table is fully in sync with the guest view.
>> >> *
>> >> + * Besides MAP, there is a special use case called FULL_MAP which
>> >> + * requests notification for all the existent mappings (e.g. VFIO
>> >> + * dirty page sync).
>> >
>> >Why do we need FULL_MAP? Can we simply reimpl MAP?
>>
>> Sorry, I just realized IOMMU_NOTIFIER_FULL_MAP is confusing.
>> Maybe IOMMU_NOTIFIER_MAP_FAST_PATH could be a bit more accurate.
>>
>> IIUC, currently replay() is called from two paths, one is VFIO device
>> address space switch which walks over the IOMMU page table to setup
>> initial mapping and cache it in IOVA tree. The other is VFIO dirty
>> sync which walks over the IOMMU page table to notify the mapping,
>> because we already cache the mapping in IOVA tree and VFIO dirty sync
>> is protected by BQL, so I think it's fine to pick mapping from IOVA
>> tree directly instead of walking over IOMMU page table. That's the
>> reason of FULL_MAP (IOMMU_NOTIFIER_MAP_FAST_PATH better).
>>
>> About "reimpl MAP", do you mean to walk over IOMMU page table to
>> notify all existing MAP events without checking with the IOVA tree for
>> difference? If you prefer, I'll rewrite an implementation this way.
>
>We still need to maintain iova tree. IIUC that's the major complexity of vt-d
>emulation, because we have that extra cache layer to sync with the real guest
>iommu pgtables.
Can't agree more, looks only intel-iommu and virtio-iommu implemented such
optimization for now.
>
>But I think we were just wrong to also notify in the unmap_all() procedure.
>
>IIUC the right thing to do (keeping replay() the interface as-is, per it used
>to be
>defined) is we should replace the unmap_all() to only evacuate the iova tree
>(keeping all host mappings untouched, IOW, don't notify UNMAP), and do a
>full resync there, which will notify all existing mappings as MAP. Then we
>don't interrupt with any existing mapping if there is (e.g. for the dirty sync
>case), meanwhile we keep sync too to latest (for moving a vfio device into an
>existing iommu group).
>
>Do you think that'll work for us?
Yes, I think I get your point.
Below simple change will work in your suggested way, do you agree?
@@ -3825,13 +3833,10 @@ static void vtd_iommu_replay(IOMMUMemoryRegion
*iommu_mr, IOMMUNotifier *n)
IntelIOMMUState *s = vtd_as->iommu_state;
uint8_t bus_n = pci_bus_num(vtd_as->bus);
VTDContextEntry ce;
+ DMAMap map = { .iova = 0, .size = HWADDR_MAX }
- /*
- * The replay can be triggered by either a invalidation or a newly
- * created entry. No matter what, we release existing mappings
- * (it means flushing caches for UNMAP-only registers).
- */
- vtd_address_space_unmap(vtd_as, n);
+ /* replay is protected by BQL, page walk will re-setup IOVA tree safely */
+ iova_tree_remove(as->iova_tree, map);
if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :
Thanks
Zhenzhong
[PATCH v2 3/4] memory: Document update on replay(), Zhenzhong Duan, 2023/06/01
[PATCH v2 4/4] intel_iommu: Optimize out some unnecessary UNMAP calls, Zhenzhong Duan, 2023/06/01