Re: [PATCH RFC 00/15] virtio-mem: Expose device memory via separate mems

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 00/15] virtio-mem: Expose device memory via separate mems

From:	David Hildenbrand
Subject:	Re: [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots
Date:	Thu, 14 Oct 2021 09:01:59 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0

On 13.10.21 21:03, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> Based-on: 20211011175346.15499-1-david@redhat.com
>>
>> A virtio-mem device is represented by a single large RAM memory region
>> backed by a single large mmap.
>>
>> Right now, we map that complete memory region into guest physical addres
>> space, resulting in a very large memory mapping, KVM memory slot, ...
>> although only a small amount of memory might actually be exposed to the VM.
>>
>> For example, when starting a VM with a 1 TiB virtio-mem device that only
>> exposes little device memory (e.g., 1 GiB) towards the VM initialliy,
>> in order to hotplug more memory later, we waste a lot of memory on metadata
>> for KVM memory slots (> 2 GiB!) and accompanied bitmaps. Although some
>> optimizations in KVM are being worked on to reduce this metadata overhead
>> on x86-64 in some cases, it remains a problem with nested VMs and there are
>> other reasons why we would want to reduce the total memory slot to a
>> reasonable minimum.
>>
>> We want to:
>> a) Reduce the metadata overhead, including bitmap sizes inside KVM but also
>>    inside QEMU KVM code where possible.
>> b) Not always expose all device-memory to the VM, to reduce the attack
>>    surface of malicious VMs without using userfaultfd.
>>
>> So instead, expose the RAM memory region not by a single large mapping
>> (consuming one memslot) but instead by multiple mappings, each consuming
>> one memslot. To do that, we divide the RAM memory region via aliases into
>> separate parts and only map the aliases into a device container we actually
>> need. We have to make sure that QEMU won't silently merge the memory
>> sections corresponding to the aliases (and thereby also memslots),
>> otherwise we lose atomic updates with KVM and vhost-user, which we deeply
>> care about when adding/removing memory. Further, to get memslot accounting
>> right, such merging is better avoided.
>>
>> Within the memslots, virtio-mem can (un)plug memory in smaller granularity
>> dynamically. So memslots are a pure optimization to tackle a) and b) above.
>>
>> Memslots are right now mapped once they fall into the usable device region
>> (which grows/shrinks on demand right now either when requesting to
>>  hotplug more memory or during/after reboots). In the future, with
>> VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we'll be able to (un)map aliases even
>> more dynamically when (un)plugging device blocks.
>>
>>
>> Adding a 500GiB virtio-mem device and not hotplugging any memory results in:
>>     0000000140000000-000001047fffffff (prio 0, i/o): device-memory
>>       0000000140000000-0000007e3fffffff (prio 0, i/o): virtio-mem-memslots
>>
>> Requesting the VM to consume 2 GiB results in (note: the usable region size
>> is bigger than 2 GiB, so 3 * 1 GiB memslots are required):
>>     0000000140000000-000001047fffffff (prio 0, i/o): device-memory
>>       0000000140000000-0000007e3fffffff (prio 0, i/o): virtio-mem-memslots
>>         0000000140000000-000000017fffffff (prio 0, ram): alias 
>> virtio-mem-memslot-0 @mem0 0000000000000000-000000003fffffff
>>         0000000180000000-00000001bfffffff (prio 0, ram): alias 
>> virtio-mem-memslot-1 @mem0 0000000040000000-000000007fffffff
>>         00000001c0000000-00000001ffffffff (prio 0, ram): alias 
>> virtio-mem-memslot-2 @mem0 0000000080000000-00000000bfffffff
> 
> I've got a vague memory that there were some devices that didn't like
> doing split IO across a memory region (or something) - some virtio
> devices?  Do you know if that's still true and if that causes a problem?

Interesting point! I am not aware of any such issues, and I'd be
surprised if we'd still have such buggy devices, because the layout
virtio-mem now creates is just very similar to the layout we'll
automatically create with ordinary DIMMs.

If we hotplug DIMMs they will end up consecutive in guest physical
address space, however, having separate memory regions and requiring
separate memory slots. So, very similar to a virtio-mem device now.

Maybe the catch is that it's hard to cross memory regions that are e.g.,
>- 128 MiB aligned, because ordinary allocations (e.g., via the buddy in
Linux which supports <= 4 MiB pages) in won't cross these blocks.

-- 
Thanks,

David / dhildenb

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH RFC 10/15] virtio-mem: Set the RamDiscardManager for the RAM memory region earlier, (continued)
- [PATCH RFC 10/15] virtio-mem: Set the RamDiscardManager for the RAM memory region earlier, David Hildenbrand, 2021/10/13
- [PATCH RFC 11/15] virtio-mem: Fix typo in virito_mem_intersect_memory_section() function name, David Hildenbrand, 2021/10/13
- [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots, David Hildenbrand, 2021/10/13
  - Re: [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots, Dr. David Alan Gilbert, 2021/10/14
    - Re: [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots, David Hildenbrand, 2021/10/14
    - Re: [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots, David Hildenbrand, 2021/10/20
- [PATCH RFC 13/15] vhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 496 with CONFIG_VIRTIO_MEM, David Hildenbrand, 2021/10/13
- [PATCH RFC 14/15] libvhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 4096, David Hildenbrand, 2021/10/13
- [PATCH RFC 15/15] virtio-mem: Set "max-memslots" to 0 (auto) for the 6.2 machine, David Hildenbrand, 2021/10/13
- Re: [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots, Dr. David Alan Gilbert, 2021/10/13
  - Re: [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots, David Hildenbrand <=

Prev by Date: [PATCH v2] aspeed: Add support for the fp5280g2-bmc board
Next by Date: Re: [PATCH v3 3/3] vdpa: Check for iova range at mappings changes
Previous by thread: Re: [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots
Next by thread: [PATCH v3] hw/usb/vt82c686-uhci-pci: Use ISA instead of PCI interrupts
Index(es):
- Date
- Thread