|
From: | Hailiang Zhang |
Subject: | Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) |
Date: | Wed, 2 Mar 2016 21:01:58 +0800 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 |
On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:
* Hailiang Zhang (address@hidden) wrote:On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:* Hailiang Zhang (address@hidden) wrote:On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:* Dr. David Alan Gilbert (address@hidden) wrote:* zhanghailiang (address@hidden) wrote:From: root <address@hidden> This is the 15th version of COLO (Still only support periodic checkpoint). Here is only COLO frame part, you can get the whole codes from github: https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode There are little changes for this series except the network releated part.I was looking at the time the guest is paused during COLO and was surprised to find one of the larger chunks was the time to reset the guest before loading each checkpoint; I've traced it part way, the biggest contributors for my test VM seem to be: 3.8ms pcibus_reset: VGA 1.8ms pcibus_reset: virtio-net-pci 1.5ms pcibus_reset: virtio-blk-pci 1.5ms qemu_devices_reset: piix4_reset 1.1ms pcibus_reset: piix3-ide 1.1ms pcibus_reset: virtio-rng-pci I've not looked deeper yet, but some of these are very silly; I'm running with -nographic so why it's taking 3.8ms to reset VGA is going to be interesting. Also, my only block device is the virtio-blk, so while I understand the standard PC machine has the IDE controller, why it takes it over a ms to reset an unused device.OK, so I've dug a bit deeper, and it appears that it's the changes in PCI bars that actually take the time; every time we do a reset we reset all the BARs, this causes it to do a pci_update_mappings and end up doing a memory_region_del_subregion. Then we load the config space of the PCI device as we do the vmstate_load, and this recreates all the mappings again. I'm not sure what the fix is, but that sounds like it would speed up the checkpoints usefully if we can avoid the map/remap when they're the same.Interesting, and thanks for your report. We already known qemu_system_reset() is a time-consuming function, we shouldn't call it here, but if we didn't do that, there will be a bug, which we have reported before in the previous COLO series, the bellow is the copy of the related patch comment:Paolo suggested one fix, see the patch below; I'm not sure if it's safe (in particular if the guest changed a bar and the device code tried to access the memory while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load times:
Nice work, i also tested it, and it is a good improvement, I'm wondering if it is safe here, it should be safe to apply to qemu_system_reset() independently (I tested it too, it will shaves about 5ms off). Hailiang
Dave commit 7570b2984143860005ad9fe79f5394c75f294328 Author: Dr. David Alan Gilbert <address@hidden> Date: Tue Mar 1 12:08:14 2016 +0000 COLO: Lock memory map around reset/load Changing the memory map appears to be expensive; we see this partiuclarly when on loading a checkpoint we: a) reset the devices This causes PCI bars to be reset b) Loading the device states This causes the PCI bars to be reloaded. Turning this all into a single memory_region_transaction saves ~10ms/checkpoint. TBD: What happens if the device code accesses the RAM during loading the checkpoint? Signed-off-by: Dr. David Alan Gilbert <address@hidden> Suggested-by: Paolo Bonzini <address@hidden> diff --git a/migration/colo.c b/migration/colo.c index 45c3432..c44fb2a 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -22,6 +22,7 @@ #include "net/colo-proxy.h" #include "net/net.h" #include "block/block_int.h" +#include "exec/memory.h" static bool vmstate_loading; @@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque) stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST); qemu_mutex_lock_iothread(); + memory_region_transaction_begin(); qemu_system_reset(VMRESET_SILENT); stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST); timed_average_account(&mis->colo_state.time_reset, @@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque) stage_time_end - stage_time_start); stage_time_start = stage_time_end; ret = qemu_load_device_state(fb); + memory_region_transaction_commit(); if (ret < 0) { error_report("COLO: load device state failed\n"); vmstate_loading = false; -- Dr. David Alan Gilbert / address@hidden / Manchester, UK .
[Prev in Thread] | Current Thread | [Next in Thread] |