|
From: | David Hildenbrand |
Subject: | Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon) |
Date: | Thu, 22 Jun 2023 14:52:13 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 |
On 22.06.23 14:14, Maciej S. Szmigiero wrote:
On 22.06.2023 14:06, David Hildenbrand wrote:On 22.06.23 13:17, Maciej S. Szmigiero wrote:On 22.06.2023 13:15, David Hildenbrand wrote:On 22.06.23 13:12, Maciej S. Szmigiero wrote:On 22.06.2023 13:01, David Hildenbrand wrote:[...]We'd use a memory region container as device memory region (like [1]) and would have to handle the !memdev case (I can help with that). > Into that, you can map the RAM memory region on demand (and eventually even using multiple slots like [1]). (2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code handle (un)plugging of the device. (1) feels cleanest to me, although it will require a bit more work.I also think approach (1) makes more sense as it avoids memslot metadata overhead for not-yet-hot-added parts of the memory backing device. Not sure what you mean that the !memdev case would be problematic in this case - it is working in the current driver shape so why would adding potential memory subregions (used in the memdev case) change that?I'm thinking about the case where you have a hv-balloon device without a memdev. Without -m X,maxmem=y we don't currently expect to have memory devices around (and especially them getting (un)plugged. But why should we "force" to set the "maxmem" optionI guess it's only a small change to QEMU to allow having hv-balloon device (without a memdev) even in the case where there's no "maxmem" option given on the QEMU command line.I hope I'll find some time soonish to prototype what I have in mind, to see if it could be made working.Okay, so I'll wait for your prototype before commencing further work on the next version of this driver.About to have something simplistic running -- I think. Want to test with a Linux VM, but I don't seem to get it working (also without my changes). #!/bin/bash build/qemu-system-x86_64 \ --enable-kvm \ -m 4G,maxmem=36G \ -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \ -smp 16 \ -nographic \ -nodefaults \ -net nic -net user \ -chardev stdio,nosignal,id=serial \ -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \ -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \ -device isa-serial,chardev=serial \ -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \ -mon chardev=monitor,mode=readline \ -device vmbus-bridge \ -object memory-backend-ram,size=2G,id=mem0 \ -device hv-balloon,id=hv1,memdev=mem0 [root@vm-0 ~]# uname -r 6.3.5-100.fc37.x86_64 [root@vm-0 ~]# modprobe hv_balloon modprobe: ERROR: could not insert 'hv_balloon': No such device Any magic flag I am missing? Or is there something preventing this to work with Linux VMs?Haven't tested the driver with Linux guests in a long time (as it is targeting Windows), but I think you need to disable KVM PV interface for the Hyper-V one to be detected by Linux. Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether the detected hypervisor is now Hyper-V. Also, you need to disable S4 in the guest for hot-add capability to work (I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this). Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'" to QEMU command line to see what's happening.VM is not happy: [ 1.908595] BUG: kernel NULL pointer dereference, address: 0000000000000007 [ 1.908837] #PF: supervisor read access in kernel mode [ 1.908837] #PF: error_code(0x0000) - not-present page [ 1.908837] PGD 0 P4D 0 [ 1.908837] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 6.3.5-100.fc37.x86_64 #1 [ 1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-p4 [ 1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0 [ 1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 03 00 00 44 89 c3 4c 89 cb [ 1.908837] RSP: 0018:ffff95b680ad7950 EFLAGS: 00010286 [ 1.908837] RAX: ffff95b680ad79e0 RBX: 0000000000000002 RCX: 0000000000000003 [ 1.908837] RDX: 0000000000000000 RSI: ffff8a0283a3c558 RDI: ffffffffa4b376e0 [ 1.908837] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000 [ 1.908837] R10: ffff8a02811034ec R11: 0000000000000000 R12: ffffffffffffffff [ 1.908837] R13: ffff8a02811034e8 R14: ffff8a02811034e8 R15: 0000000000000000 [ 1.908837] FS: 00007f3bb2e7d0c0(0000) GS:ffff8a02bbd40000(0000) knlGS:0000000000000000 [ 1.908837] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.908837] CR2: 0000000000000007 CR3: 0000000100a58002 CR4: 0000000000770ee0 [ 1.908837] PKRU: 55555554 [ 1.908837] Call Trace: [ 1.908837] <TASK> [ 1.908837] ? __die+0x23/0x70 [ 1.908837] ? page_fault_oops+0x171/0x4e0 [ 1.908837] ? prepare_alloc_pages.constprop.0+0xf6/0x1a0 [ 1.908837] ? exc_page_fault+0x74/0x170 [ 1.908837] ? asm_exc_page_fault+0x26/0x30 [ 1.908837] ? acpi_ns_lookup+0x8f/0x4c0 [ 1.908837] acpi_ns_get_node_unlocked+0xdd/0x110 [ 1.908837] ? down_timeout+0x3e/0x60 [ 1.908837] ? acpi_ns_get_node+0x3e/0x60 [ 1.908837] acpi_ns_get_node+0x3e/0x60 [ 1.908837] acpi_ns_evaluate+0x1cb/0x2d0 [ 1.908837] acpi_ut_evaluate_object+0x68/0x1c0 [ 1.908837] acpi_rs_get_method_data+0x37/0x80 [ 1.908837] ? __pfx_vmbus_walk_resources+0x10/0x10 [hv_vmbus] [ 1.908837] acpi_walk_resources+0x91/0xe0 [ 1.908837] vmbus_acpi_add+0x87/0x170 [hv_vmbus] [ 1.908837] acpi_device_probe+0x47/0x160 [ 1.908837] really_probe+0x19f/0x400 [ 1.908837] ? __pfx___driver_attach+0x10/0x10 [ 1.908837] __driver_probe_device+0x78/0x160 [ 1.908837] driver_probe_device+0x1f/0x90 [ 1.908837] __driver_attach+0xd2/0x1c0 [ 1.908837] bus_for_each_dev+0x85/0xd0 [ 1.908837] bus_add_driver+0x116/0x220 [ 1.908837] driver_register+0x59/0x100 [ 1.908837] ? __pfx_init_module+0x10/0x10 [hv_vmbus] [ 1.908837] hv_acpi_init+0x39/0xff0 [hv_vmbus] [ 1.908837] ? __pfx_init_module+0x10/0x10 [hv_vmbus] [ 1.908837] do_one_initcall+0x5a/0x240 [ 1.908837] do_init_module+0x4a/0x210 [ 1.908837] __do_sys_init_module+0x17f/0x1b0 [ 1.908837] do_syscall_64+0x5c/0x90 [ 1.908837] ? handle_mm_fault+0x11e/0x310 [ 1.908837] ? do_user_addr_fault+0x1e0/0x720 [ 1.908837] ? exc_page_fault+0x74/0x170 [ 1.908837] entry_SYSCALL_64_after_hwframe+0x72/0xdcI guess *few* people run Linux with QEMU Hyper-V interfaces implementation..Maybe I'll have to install a Windows guest :/I think that makes more sense, since we're targeting Windows anyway.Having installed fairly recent Win10 and running with master+your patches, I still can't get it to work. Windows is stuck booting (before the little circle starts turning). Removing the hv-balloon device makes it work again (well, at least the circle spins again my windows installation now seems to be broken and I have to reinstall ... windows). Do you have a working cmdline for Windows I can try?Do you get any tracing output from the driver when it is loaded or does its mere presence make your Windows installation fail to boot? I'm was testing using "qemu-system-x86_64 -machine q35,accel=kvm \ -cpu host,vmx,-mpx,-pmu,check,hv-crash,hv-relaxed,hv-time,hv-vapic,hv-spinlocks=0x1fff,hv-
^ That magic collection of flags seems to make Win10 happy. Master+your patches (qemu) balloon 3000 (qemu) info balloon balloon: actual=3000 (qemu) balloon 8000 (qemu) info balloon balloon: actual=6144 As the VM has 4G and the memdev 2G, that's expected (although I'd have expected an error when trying to set above what's possible). On my prototype: (qemu) balloon 3000 (qemu) info balloon balloon: actual=3000 (qemu) balloon 8000 (qemu) info balloon balloon: actual=6144 Further (some adjustments from me): (qemu) info memory-devices Memory device [hv-balloon]: "hv1" memaddr: 0x800000000 max-size: 2147483648 memdev: /objects/mem0 (qemu) info mtree ... address-space: device-memory 0000000180000000-000000097fffffff (prio 0, i/o): device-memory 0000000800000000-000000087fffffff (prio 0, i/o): hv-balloon 0000000800000000-000000087fffffff (prio 0, ram): mem0 ... (qemu) info memory_size_summary base memory: 4294967296 plugged memory: 2147483648 Without a memdev: (qemu) info memory-devices Memory device [hv-balloon]: "hv1" max-size: 0 I noticed that sometimes balloon requests get essentially ignored. Even retrying to set a target doesn't change anything. Only after some time it eventually works. (qemu) balloon 3000 (qemu) info balloon balloon: actual=4096 ... (qemu) info balloon balloon: actual=4096 (qemu) balloon 4000 (qemu) info balloon balloon: actual=4096 ... (qemu) balloon 3000 (qemu) info balloon balloon: actual=4096 (qemu) balloon 5000 (qemu) balloon 2000 (qemu) info balloon balloon: actual=3152 (qemu) info balloon balloon: actual=2000 Maybe the VM needs some time after start-up to actually process requests. Anyhow, the prototype lives at: https://github.com/davidhildenbrand/qemu/tree/hv-balloon Right now it's on top of: https://github.com/davidhildenbrand/qemu/tree/virtio-mem-memslots Patches/commits: 1) memory-device: Support empty memory devices 2) memory-device: Drop size alignment check -> Two added patches 3) error: define g_autoptr() cleanup function for the Error type 4) Add Hyper-V Dynamic Memory Protocol definitions 5) Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon) -> Your unmodified patches 6) ea0e4775ca tmp -> Me going wild The alignment thingy is indeed ugly, but unless we want to fail hotadd for devices with crazy alignment requirements, it's probably best the way it is on that branch. You should definitely split up patch #3 into self-contained logical chunks (e.g., split off the whole hotadd part, common code changes, ...). Also, I'm not sure if the whole dynamic OurRangePlugged allocation thingy is not really required, looks like that can be optimized. I have some other cleanups in mind, but that is best kept to later, once the patches are in an easier-to-review-sized shape. -- Cheers, David / dhildenb
[Prev in Thread] | Current Thread | [Next in Thread] |