[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC: use VFIO over a UNIX domain socket to implement device offloadi
From: |
Stefan Hajnoczi |
Subject: |
Re: RFC: use VFIO over a UNIX domain socket to implement device offloading |
Date: |
Mon, 15 Jun 2020 11:49:29 +0100 |
On Tue, Jun 09, 2020 at 11:25:41PM -0700, John G Johnson wrote:
> > On Jun 2, 2020, at 8:06 AM, Alex Williamson <alex.williamson@redhat.com>
> > wrote:
> >
> > On Wed, 20 May 2020 17:45:13 -0700
> > John G Johnson <john.g.johnson@oracle.com> wrote:
> >
> >>> I'm confused by VFIO_USER_ADD_MEMORY_REGION vs VFIO_USER_IOMMU_MAP_DMA.
> >>> The former seems intended to provide the server with access to the
> >>> entire GPA space, while the latter indicates an IOVA to GPA mapping of
> >>> those regions. Doesn't this break the basic isolation of a vIOMMU?
> >>> This essentially says to me "here's all the guest memory, but please
> >>> only access these regions for which we're providing DMA mappings".
> >>> That invites abuse.
> >>>
> >>
> >> The purpose behind separating QEMU into multiple processes is
> >> to provide an additional layer protection for the infrastructure against
> >> a malign guest, not for the guest against itself, so preventing a server
> >> that has been compromised by a guest from accessing all of guest memory
> >> adds no additional benefit. We don’t even have an IOMMU in our current
> >> guest model for this reason.
> >
> > One of the use cases we see a lot with vfio is nested assignment, ie.
> > we assign a device to a VM where the VM includes a vIOMMU, such that
> > the guest OS can then assign the device to userspace within the guest.
> > This is safe to do AND provides isolation within the guest exactly
> > because the device only has access to memory mapped to the device, not
> > the entire guest address space. I don't think it's just the hypervisor
> > you're trying to protect, we can't assume there are always trusted
> > drivers managing the device.
> >
>
> We intend to support an IOMMU. The question seems to be whether
> it’s implemented in the server or client. The current proposal has it
> in the server, ala vhost-user, but we are fine with moving it.
It's challenging to implement a fast and secure IOMMU. The simplest
approach is secure but not fast: add protocol messages for
DMA_READ(iova, length) and DMA_WRITE(iova, buffer, length).
An issue with file descriptor passing is that it's hard to revoke access
once the file descriptor has been passed. memfd supports sealing with
fnctl(F_ADD_SEALS) it doesn't revoke mmap(MAP_WRITE) on other processes.
Memory Protection Keys don't seem to be useful here either and their
availability is limited (see pkeys(7)).
One crazy idea is to use KVM as a sandbox for running the device and let
the vIOMMU control the page tables instead of the device (guest). That
way the hardware MMU provides memory translation, but I think this is
impractical because the guest environment is too different from the
Linux userspace environment.
As a starting point adding DMA_READ/DMA_WRITE messages would provide the
functionality and security. Unfortunately it makes DMA expensive and
performance will suffer.
Stefan
signature.asc
Description: PGP signature