qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memo


From: david.dai
Subject: Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memory to QEMU
Date: Sat, 9 Oct 2021 17:42:33 +0800

On Thu, Sep 30, 2021 at 12:33:30PM +0200, David Hildenbrand (david@redhat.com) 
wrote:
> 
> 
> On 30.09.21 11:40, david.dai wrote:
> > On Wed, Sep 29, 2021 at 11:30:53AM +0200, David Hildenbrand 
> > (david@redhat.com) wrote:
> > > 
> > > On 27.09.21 14:28, david.dai wrote:
> > > > On Mon, Sep 27, 2021 at 11:07:43AM +0200, David Hildenbrand 
> > > > (david@redhat.com) wrote:
> > > > > 
> > > > > CAUTION: This email originated from outside of the organization. Do 
> > > > > not
> > > > > click links or open attachments unless you recognize the sender and 
> > > > > know the
> > > > > content is safe.
> > > > > 
> > > > > 
> > > > > On 27.09.21 10:27, Stefan Hajnoczi wrote:
> > > > > > On Sun, Sep 26, 2021 at 10:16:14AM +0800, David Dai wrote:
> > > > > > > Add a virtual pci to QEMU, the pci device is used to dynamically 
> > > > > > > attach memory
> > > > > > > to VM, so driver in guest can apply host memory in fly without 
> > > > > > > virtualization
> > > > > > > management software's help, such as libvirt/manager. The attached 
> > > > > > > memory is
> > > > > 
> > > > > We do have virtio-mem to dynamically attach memory to a VM. It could 
> > > > > be
> > > > > extended by a mechanism for the VM to request more/less memory, that's
> > > > > already a planned feature. But yeah, virito-mem memory is exposed as
> > > > > ordinary system RAM, not only via a BAR to mostly be managed by user 
> > > > > space
> > > > > completely.
> > > 
> > > There is a virtio-pmem spec proposal to expose the memory region via a PCI
> > > BAR. We could do something similar for virtio-mem, however, we would have 
> > > to
> > > wire that new model up differently in QEMU (it's no longer a "memory 
> > > device"
> > > like a DIMM then).
> > > 
> > > > > 
> > > > 
> > > > I wish virtio-mem can solve our problem, but it is a dynamic allocation 
> > > > mechanism
> > > > for system RAM in virtualization. In heterogeneous computing 
> > > > environments, the
> > > > attached memory usually comes from computing device, it should be 
> > > > managed separately.
> > > > we doesn't hope Linux MM controls it.
> > > 
> > > If that heterogeneous memory would have a dedicated node (which usually is
> > > the case IIRC) , and you let it manage by the Linux kernel (dax/kmem), you
> > > can bind the memory backend of virtio-mem to that special NUMA node. So 
> > > all
> > > memory managed by that virtio-mem device would come from that 
> > > heterogeneous
> > > memory.
> > > 
> > 
> > Yes, CXL type 2, 3 devices expose memory to host as a dedicated node, the 
> > node
> > is marked as soft_reserved_memory, dax/kmem can take over the node to 
> > create a
> > dax devcie. This dax device can be regarded as the memory backend of 
> > virtio-mem
> > 
> > I don't sure whether a dax device can be open by multiple VMs or host 
> > applications.
> 
> virito-mem currently relies on having a single sparse memory region (anon
> mmap, mmaped file, mmaped huge pages, mmap shmem) per VM. Although we can
> share memory with other processes, sharing with other VMs is not intended.
> Instead of actually mmaping parts dynamically (which can be quite
> expensive), virtio-mem relies on punching holes into the backend and
> dynamically allocating memory/file blocks/... on access.
> 
> So the easy way to make it work is:
> 
> a) Exposing the CXL memory to the buddy via dax/kmem, esulting in device
> memory getting managed by the buddy on a separate NUMA node.
>

Linux kernel buddy system? how to guarantee other applications don't apply 
memory
from it

>
> b) (optional) allocate huge pages on that separate NUMA node.
> c) Use ordinary memory-device-ram or memory-device-memfd (for huge pages),
> *bidning* the memory backend to that special NUMA node.
>
 
"-object memory-backend/device-ram or memory-device-memfd, id=mem0, size=768G"
How to bind backend memory to NUMA node

>
> This will dynamically allocate memory from that special NUMA node, resulting
> in the virtio-mem device completely being backed by that device memory,
> being able to dynamically resize the memory allocation.
> 
> 
> Exposing an actual devdax to the virtio-mem device, shared by multiple VMs
> isn't really what we want and won't work without major design changes. Also,
> I'm not so sure it's a very clean design: exposing memory belonging to other
> VMs to unrelated QEMU processes. This sounds like a serious security hole:
> if you managed to escalate to the QEMU process from inside the VM, you can
> access unrelated VM memory quite happily. You want an abstraction
> in-between, that makes sure each VM/QEMU process only sees private memory:
> for example, the buddy via dax/kmem.
> 
Hi David
Thanks for your suggestion, also sorry for my delayed reply due to my long 
vacation.
How does current virtio-mem dynamically attach memory to guest, via page fault?

Thanks,
David 


> -- 
> Thanks,
> 
> David / dhildenb
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]