[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] migration: add FEATURE_SEEKABLE to QIOChannelBlock
From: |
Peter Xu |
Subject: |
Re: [PATCH] migration: add FEATURE_SEEKABLE to QIOChannelBlock |
Date: |
Fri, 9 May 2025 12:21:26 -0400 |
On Fri, May 09, 2025 at 02:51:47PM +0200, Marco Cavenati wrote:
> Hello Peter,
>
> On Thursday, May 08, 2025 22:23 CEST, Peter Xu <peterx@redhat.com> wrote:
>
> > > The scenarios where zeroing is not required (incoming migration and
> > > -loadvm) share a common characteristic: the VM has not yet run in the
> > > current QEMU process.
> > > To avoid splitting read_ramblock_mapped_ram(), could we implement
> > > a check to determine if the VM has ever run and decide whether to zero
> > > the memory based on that? Maybe using RunState?
> > >
> > > Then we can add something like this to read_ramblock_mapped_ram()
> > > ...
> > > clear_bit_idx = 0;
> > > for (...) {
> > > // Zero pages
> > > if (guest_has_ever_run()) {
> > > unread = TARGET_PAGE_SIZE * (set_bit_idx - clear_bit_idx);
> > > offset = clear_bit_idx << TARGET_PAGE_BITS;
> > > host = host_from_ram_block_offset(block, offset);
> > > if (!host) {...}
> > > ram_handle_zero(host, unread);
> > > }
> > > // Non-zero pages
> > > clear_bit_idx = find_next_zero_bit(bitmap, num_pages, set_bit_idx +
> > > 1);
> > > ...
> > > (Plus trailing zero pages handling)
> >
> > [...]
> >
> > > > >> > In a nutshell, I'm using dirty page tracking to load from the
> > > > >> > snapshot
> > > > >> > only the pages that have been dirtied between two loadvm;
> > > > >> > mapped-ram is required to seek and read only the dirtied pages.
> >
> > I may not have the full picture here, please bare with me if so.
> >
> > It looks to me the major feature here you're proposing is like a group of
> > snapshots in sequence, while only the 1st snapshot contains full RAM data,
> > the rest only contains what were dirtied?
> >
> > From what I can tell, the interface will be completely different from
> > snapshots then - firstly the VM will be running when taking (at least the
> > 2nd+) snapshots, meanwhile there will be an extra phase after normal save
> > snapshot, which is "keep saving snapshots", during the window the user is
> > open to snapshot at any time based on the 1st snapshot. I'm curious what's
> > the interfacing for the feature. It seems we need a separate command
> > saying that "finished storing the current group of snapshots", which should
> > stop the dirty tracking.
>
> My goal is to speed up recurrent snapshot restore of short living VMs.
> In my use case I create one snapshot, and then I restore it thousands
> of times, leaving the VM running for just a few functions execution for
> example.
> Still, you are right in saying that this is a two steps process.
> What I added (not in this patch, but in a downstream fork atm) are a
> couple of HMP commands:
> - loadvm_for_hotreaload: in a nutshell it's a loadvm that also starts dirty
> tracking
> - hotreload: again a loadvm but that takes advantage of the dirty log
> to selectively restore only dirty pages
>
> > I'm also curious what is the use case, and I also wonder if "whether we
> > could avoid memset(0) on a zero page" is anything important here - maybe
> > you could start with simple (which is to always memset() for a zero page
> > when a page is dirtied)?
>
> My use case is, you guessed it, fuzz testing aka fuzzing.
> About the zeroing, you are right, optimizing it is not a huge concern for
> my use case, doing what you say is perfectly fine.
>
> Just to be clear, what I describe above is not the content of this patch.
> This patch aims only to make a first step in adding the support for the
> mapped-ram feature for savevm/loadvm snapshots, which is a
> prerequisite for my hotreload feature.
> mapped-ram is currently supported only in (file) migration.
> What's missing from this patch to have it working completely, is the
> handling of zero pages. Differently from migration, with snapshots pages
> are not all zero prior to the restore and must therefore be handled.
>
> I hope I summarized in an understandable way, if not I'll be happy to
> further clarify :)
Yes, thanks.
So you don't really need to take sequence of snapshots? Hmm, that sounds
like a completely different use case that I originally thought.
Have you thought of leveraging ignore-shared and MAP_PRIVATE for the major
chunk of guest mem?
Let me explain; it's a very rough idea, but maybe you can collect something
useful.
So.. if you keep reloading one VM state thousands of times, it's better
first that you have some shmem file (let's imagine that's enough.. you
could have more backends) keeping the major chunk of the VM RAM image that
you migrated before into a file.
Say, the major part of guest mem is stored here:
PATH_RAM=/dev/shm/XXX
Then you migrate (with ignore-shared=on) to a file here (NOTE: I _think_
you really can use file migration in this case with VM stopped first, not
snapshot save/load):
PATH_VM_IMAGE=/tmp/VM_IMAGE_YYY
Then, the two files above should contain all info you need to start a new
VM.
When you want to recover that VM state, boot a VM using this cmdline:
$qemu ... \
-object memory-backend-file,mem-path=$PATH_RAM,share=off
-incoming file:$PATH_VM_IMAGE
That'll boot a VM, directly loading the shmem page cache (always present on
the host, occupying RAM, though, outside of VM lifecycle, but it's part of
the design..), loading VM image would be lightning fast because it's tiny
when there's almost no RAM inside. No concern on mapped-ram at all as the
rest RAMs are too trivial to just be a stream.
The important bit is share=off - that will mmap() the VM major RAM as
MAP_PRIVATE, then it'll do CoW on the "snapshot" you made before, whenever
you writes to some guest pages during fuzzing some functions, it copies the
shmem page cache over. shmem page cache should never change its content.
Sounds working to you?
Thanks,
--
Peter Xu