qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver


From: Maciej S. Szmigiero
Subject: Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)
Date: Tue, 20 Jun 2023 22:13:10 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0

On 19.06.2023 17:58, David Hildenbrand wrote:
[...]

Sorry for the late reply!

Still trying to make up my mind what the right way forward with this is.


This usage is still problematic I suspect (well, and a layer violation 
regarding the machine). The machine hotplug handler is supposed to call the 
pre_plug/plug/unplug hooks as response to pre_plug/plug/unplug notifications 
from the core. See how we handle virtio-mem/virtio-pmem/nvdimms as an example.

We assume that when memory_device_pre_plug() gets called, that the device is 
not realized yet, but once it gets plugged, that it already is realized, and 
that the device will actually vanish (get unrealized) when unplugging the 
device.
Otherwise memory device logic like in get_plugged_memory_size() stops working.

get_plugged_memory_size() just calls get_plugged_size() method on every
realized TYPE_MEMORY_DEVICE.

While this now always returns the whole backing memory size (once the
backend gets plugged) I don't see a reason why this method could not be
overridden in hv-balloon to return just the currently hot-added size.

By the way, this function seems to be used just for reporting stats via QMP.

memory_device_build_list() is another example, used for 
memory_device_get_free_addr().

I don't see it calling get_plugged_size() method, I can see it only using
(indirectly) get_addr() method.

You'd be blocking memory address ranges with an unplugged-but-realized memory device.> Memory device code expects that realized memory devices are plugged and vice versa.

Which QEMU code you mean specifically? Maybe it just needs a trivial
change.

Before the driver hot-adds the first chunk of memory it does not use any
part of the address space.

After that, it has to reserve address space for the whole backing memory
device, so no other devices will claim parts of it and because a
TYPE_MEMORY_DEVICE (currently) can have just a single range.

This address space is released when the VM is restarted.



As an example, see device_set_realized() on the pre_plug+realize+plug 
interaction.

IIRC, you're reusing the already-realized hv-balloon device here, correct?

Yes - in this version of the driver.

The previous version used separate virtual DIMM devices instead but you have
recommended against that approach.


Yes. My recommendation was to make the hv-balloon device a memory device and 
use a single memory region, which you did (and I think it's much better).

It's now all about when we (un)plug the memory device itself -- and how.


Why can't you call the pre_plug/plug/unplug functions from the machine 
pre_plug/plug/unplug hooks -- exactly once for the memory device when plugging 
the hv-balloon device?

Is it to support the !memdev case or why is this this plugging/unplugging in 
our_range_plugged_new()/our_range_plugged_free() required?

At least for three (four) reasons:
1a) At the hv-balloon plug time the device doesn't yet know the guest
alignement requirements - or whether the guest supports memory hot add at
all - that's what the device will learn only once the guest connects
to the protocol.

Understood, so you want to at least expose the memory dynamically to the VM 
(map the MR on demand).

That could be done using a memory region container like virtio-mem is planning 
[1] on using fairly easily.

[1] 20230616092654.175518-14-david@redhat.com">https://lkml.kernel.org/r/20230616092654.175518-14-david@redhat.com

Thanks for the pointer to your series - I've looked at it and it seems
to me that while it allows multiple memory subregions, each backed by
a separate memslot it still needs a single big main region for
the particular TYPE_MEMORY_DEVICE, am I right?

1b) For the same reason the memory region has to be unplugged at the VM
reset time - the new guest might have stricter alignement requirements

Alignment is certainly interesting, but is it a real problem?

As default (not other memory devices) you get an address that's aligned to 1 
GiB. And, in fact, you can simply always request a 1 GiB alignment for the 
device, independent of the guest requirement.

Would the guest requirement be even stricter than that (e.g., 2 GiB)?

The protocol allows up to 32 GiB alignement so we cannot simply
hardcode the alignement to 1 GiB, especially since this is Windows
we're talking about (so this parameter is subject to unpredictable
changes).

In theory, when using a memory region container (again [1]) into which you 
dynamically map the RAM region, you can do this alignment internally.

So it might be an option to use a memory region container and dynamically map 
into that one as you please (it just has to have a fixed size).

Still, demand-allocating just the right memory region (with the right
alignement) seems to me like a cleaner solution than allocating a huge
worst-case memory region upfront and then trying to carve the right
part of it.


By the way, the memory region *can't* be unplugged yet at VMBus device
reset time - Windows keeps on using it until the system is restarted,
even after disconnecting from the VMBus.

Yes, similar to virtio-mem -- we can only e.g. do it at system reset time.


2) The !memdev case, when the driver is just used for Windows-native
ballooning and stats reporting.

So we'd want support for a memory device that doesn't expose any memory -- in the 
current configuration. Should be doable (NULL returned as device memory region 
-> skip pre_plug/plug/unplug and teach the other code to just ignore this 
device). It would be easier if we could decide at runtime that this device is not 
a memory device ...

But let's first figure out if that's the right approach.



3) This will hopefully allow sharing the backing memory device between
virtio-mem and hv-balloon in the future - Linux guests will connect to
the former interface while Windows guests will connect to the later.


I've been told that the virtio-mem driver for Windows will show up polished in 
the near future ... we'll see :)

virtio-mem driver for Windows is still at version 1 at [1] so I guess
it's going to take a while for it to be production-ready - due to, in part,
difficulty in acquiring stability guarantees for the required Windows
kernel MM APIs.

Anyhow, I consider that a secondary requirement. (virtio-mem is not compatible 
with shared memdevs)



Supporting the !memdev case is interesting: you essentially want to plug a 
memory device without a device region (or with an empty stub). I guess we 
should get that figured out somehow.


That's why the previous version of this driver used a parent VMBus
device (non-TYPE_MEMORY_DEVICE) which then had virtual DIMMs as its
children carrying memory for hot add.

So we have the following two options I think

(1) Make the hv-balloon device a memory device,

It's already a TYPE_MEMORY_DEVICE, so no change required here.

plug/unplug it from the machine code when plug/unplug'ing the hv-balloon device.

Don't quite follow that - the device is not designed to be hot-pluggable
(I'm not sure that Windows would even allow hot-plugging a DM protocol
device).

It is rather designed to be instantiated at the QEMU command line for the
lifetime of the VM.

We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.


I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?

Thanks,
Maciej

[1]: https://github.com/virtio-win/kvm-guest-drivers-windows/pull/794




reply via email to

[Prev in Thread] Current Thread [Next in Thread]