Re: [Qemu-ppc] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passth

qemu-ppc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passth

From:	David Gibson
Subject:	Re: [Qemu-ppc] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough
Date:	Thu, 14 Feb 2019 15:59:32 +1100
User-agent:	Mutt/1.10.1 (2018-07-13)

On Mon, Feb 11, 2019 at 02:49:49PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 08/02/2019 16:28, David Gibson wrote:
> > On Thu, Feb 07, 2019 at 08:26:20PM -0700, Alex Williamson wrote:
> >> On Fri, 8 Feb 2019 13:29:37 +1100
> >> Alexey Kardashevskiy <address@hidden> wrote:
> >>
> >>> On 08/02/2019 02:18, Alex Williamson wrote:
> >>>> On Thu, 7 Feb 2019 15:43:18 +1100
> >>>> Alexey Kardashevskiy <address@hidden> wrote:
> >>>>   
> >>>>> On 07/02/2019 04:22, Daniel Henrique Barboza wrote:  
> >>>>>> Based on this series, I've sent a Libvirt patch to allow a QEMU process
> >>>>>> to inherit IPC_LOCK when using VFIO passthrough with the Tesla V100
> >>>>>> GPU:
> >>>>>>
> >>>>>> https://www.redhat.com/archives/libvir-list/2019-February/msg00219.html
> >>>>>>
> >>>>>>
> >>>>>> In that thread, Alex raised concerns about allowing QEMU to freely lock
> >>>>>> all the memory it wants. Is this an issue to be considered in the 
> >>>>>> review
> >>>>>> of this series here?
> >>>>>>
> >>>>>> Reading the patches, specially patch 3/3, it seems to me that QEMU is
> >>>>>> going to lock the KVM memory to populate the NUMA node with memory
> >>>>>> of the GPU itself, so at first there is no risk of not taking over the
> >>>>>> host RAM.
> >>>>>> Am I missing something?    
> >>>>>
> >>>>>
> >>>>> The GPU memory belongs to the device and not visible to the host as
> >>>>> memory blocks and not covered by page structs, for the host it is more
> >>>>> like MMIO which is passed through to the guest without that locked
> >>>>> accounting, I'd expect libvirt to keep working as usual except that:
> >>>>>
> >>>>> when libvirt calculates the amount of memory needed for TCE tables
> >>>>> (which is guestRAM/64k*8), now it needs to use the end of the last GPU
> >>>>> RAM window as a guest RAM size. For example, in QEMU HMP "info mtree 
> >>>>> -f":
> >>>>>
> >>>>> FlatView #2
> >>>>>  AS "memory", root: system
> >>>>>  AS "cpu-memory-0", root: system
> >>>>>  Root memory region: system
> >>>>>   0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
> >>>>>   0000010000000000-0000011fffffffff (prio 0, ram): nvlink2-mr
> >>>>>
> >>>>> So previously the DMA window would cover 0x7fffffff+1, now it has to
> >>>>> cover 0x11fffffffff+1.  
> >>>>
> >>>> This looks like a chicken and egg problem, you're saying libvirt needs
> >>>> to query mtree to understand the extent of the GPU layout, but we need
> >>>> to specify the locked memory limits in order for QEMU to start?  Is
> >>>> libvirt supposed to start the VM with unlimited locked memory and fix
> >>>> it at some indeterminate point in the future?  Run a dummy VM with
> >>>> unlimited locked memory in order to determine the limits for the real
> >>>> VM?  Neither of these sound practical.  Thanks,  
> >>>
> >>>
> >>> QEMU maps GPU RAM at known locations (which only depends on the vPHB's
> >>> index or can be set explicitely) and libvirt knows how many GPUs are
> >>> passed so it is quite easy to calculate the required amount of memory.
> >>>
> >>> Here is the window start calculation:
> >>> https://github.com/aik/qemu/commit/7073cad3ae7708d657e01672bcf53092808b54fb#diff-662409c2a5a150fe231d07ea8384b920R3812
> >>>
> >>> We do not exactly know the GPU RAM window size until QEMU reads it from
> >>> VFIO/nvlink2 but we know that all existing hardware has a window of
> >>> 128GB (the adapters I have access to only have 16/32GB on board).
> >>
> >> So you're asking that libvirt add 128GB per GPU with magic nvlink
> >> properties, which may be 8x what's actually necessary and libvirt
> >> determines which GPUs to apply this to how?  Does libvirt need to sort
> >> through device tree properties for this?  Thanks,
> > 
> > Hm.  If the GPU memory is really separate from main RAM, which it
> > sounds like, I don't think it makes sense to account it against the
> > same locked memory limit as regular RAM.
> 
> This is accounting for TCE table to cover GPU RAM, not for GPU RAM itself.

Ah, ok, that makes sense then

> So I am asking libvirt to add 128GB/64k*8=16MB to the locked_vm. It
> already does so for the guest RAM.

That seems reasonable.  IIRC we already have some slop in the amount
of locked vm that libvirt allocates; not sure if it'll be enough as
is.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-ppc] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough, (continued)

Prev by Date: Re: [Qemu-ppc] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough
Next by Date: [Qemu-ppc] [PATCH qemu v2 0/4] spapr_pci, vfio: NVIDIA V100 + POWER9 passthrough
Previous by thread: Re: [Qemu-ppc] [PATCH qemu 0/3] spapr_pci, vfio: NVIDIA V100 + P9 passthrough
Next by thread: Re: [Qemu-ppc] [PATCH] hw/ppc/spapr: Add support for "-vga cirrus"
Index(es):
- Date
- Thread