[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v5 0/4] migration: UFFD write-tracking migration/snapshots
From: |
Peter Xu |
Subject: |
Re: [PATCH v5 0/4] migration: UFFD write-tracking migration/snapshots |
Date: |
Tue, 8 Dec 2020 13:24:53 -0500 |
On Fri, Dec 04, 2020 at 12:30:59PM +0300, Andrey Gruzdev wrote:
> This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's
> implemented in his series '[PATCH v0 0/4] migration: add background snapshot'.
>
> Currently the only way to make (external) live VM snapshot is using existing
> dirty page logging migration mechanism. The main problem is that it tends to
> produce a lot of page duplicates while running VM goes on updating already
> saved pages. That leads to the fact that vmstate image size is commonly
> several
> times bigger then non-zero part of virtual machine's RSS. Time required to
> converge RAM migration and the size of snapshot image severely depend on the
> guest memory write rate, sometimes resulting in unacceptably long snapshot
> creation time and huge image size.
>
> This series propose a way to solve the aforementioned problems. This is done
> by using different RAM migration mechanism based on UFFD write protection
> management introduced in v5.7 kernel. The migration strategy is to 'freeze'
> guest RAM content using write-protection and iteratively release protection
> for memory ranges that have already been saved to the migration stream.
> At the same time we read in pending UFFD write fault events and save those
> pages out-of-order with higher priority.
>
> How to use:
> 1. Enable write-tracking migration capability
> virsh qemu-monitor-command <domain> --hmp migrate_set_capability.
> track-writes-ram on
>
> 2. Start the external migration to a file
> virsh qemu-monitor-command <domain> --hmp migrate exec:'cat > ./vm_state'
>
> 3. Wait for the migration finish and check that the migration has completed.
> state.
>
> Changes v4->v5:
>
> * 1. Refactored util/userfaultfd.c code to support features required by
> postcopy.
> * 2. Introduced checks for host kernel and guest memory backend compatibility
> * to 'background-snapshot' branch in migrate_caps_check().
> * 3. Switched to using trace_xxx instead of info_report()/error_report() for
> * cases when error message must be hidden (probing UFFD-IO) or info may be
> * really littering output if goes to stderr.
> * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list.
> * 5. Added memory_region_ref() for each RAM block being wr-protected.
> * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup
> routine.
> * 7. Refused from using specific hwaddr/ram_addr_t in favour of void
> */uint64_t.
> * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that
> * that choosen criteria for high-latency fault detection (i.e. timestamp of
> * UFFD event fetch) is not representative enough for this task.
> * At the moment it looks somehow like premature optimization effort.
> * 8. Dropped some unnecessary/unused code.
I went over the series and it looks nice!
There're a few todos for this series, so I added them into the wiki page (I
created a "feature" section for migration todo and put live snapshot there):
https://wiki.qemu.org/ToDo/LiveMigration#Features
Anyone feel free to add..
Thanks,
--
Peter Xu