qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during


From: Peter Xu
Subject: Re: [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration
Date: Thu, 14 Nov 2024 17:40:15 -0500

On Thu, Nov 14, 2024 at 10:16:41PM +0100, David Hildenbrand wrote:
> On 14.11.24 20:28, Peter Xu wrote:
> > On Thu, Nov 14, 2024 at 10:02:37AM +0100, David Hildenbrand wrote:
> > > On 13.11.24 21:12, Peter Xu wrote:
> > > > On Wed, Nov 13, 2024 at 07:49:44PM +0100, David Hildenbrand wrote:
> > > > > I think I had precisely that, and I recall you suggested to have it 
> > > > > only
> > > > > after the initial sync. Would work for me, but I'd still like to 
> > > > > understand
> > > > > why essentially none of the "discard" was effective -- all of guest 
> > > > > RAM got
> > > > > touched.
> > > > 
> > > > Yes it'll be interesting to know..
> > > > 
> > > > One thing I'm wildly guessing is dirty_memory_extend(), so maybe after 
> > > > the
> > > > ramblock is created nobody yet to clear the "1"s there for each of the
> > > > client, including DIRTY_MEMORY_MIGRATION.  Then it'll be synced to 
> > > > ramblock
> > > > bmap only in the initial sync, once for each qemu lifecycle.
> > > 
> > > 
> > > In ram_block_add() we do the
> > > 
> > > cpu_physical_memory_set_dirty_range(new_block->offset,
> > >                               new_block->used_length,
> > >                               DIRTY_CLIENTS_ALL);
> > > 
> > > ramblock_dirty_bitmap_clear_discarded_pages()->...->migration_clear_memory_region_dirty_bitmap_range()->migration_clear_memory_region_dirty_bitmap()
> > > won't end up clearing the bits in the dirty bitmap.
> > > 
> > > First I thought because of:
> > > 
> > > if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page)) {
> > >      return;
> > > }
> > > 
> > > But then I realized that even memory_region_clear_dirty_bitmap() will not
> > > clear the ramblock_dirty_bitmap_ bits! It's only concerned about
> > > listener->log_clear() calls.
> > > 
> > > Looking for DIRTY_MEMORY_BLOCK_SIZE users, only
> > > cpu_physical_memory_sync_dirty_bitmap() and
> > > cpu_physical_memory_clear_dirty_range() clear them, whereby the latter is
> > > only used when resizing RAMblocks.
> > > 
> > > At first I wondered whether ramblock_dirty_bitmap_clear_discarded_pages()
> > > should also call cpu_physical_memory_clear_dirty_range(), but then I am 
> > > not
> > > so sure if that is really the right approach.
> > 
> > That sounds actually reasonable to me so far.. What's the concern in your
> > mind?
> 
> I think what I had in mind was that for the initial bitmap sync, when we set
> the bmap to all-1s already, we could just clear the whole
> ramblock_dirty_bitmap_ + KVM ... bitmaps.
> 
> So, instead of an "initial sync" we might just want to do an "initial
> clearing" of all bitmaps.

Logically most dirty tracking bitmaps should start with all zeros.  KVM old
kernels are like that; KVM_DIRTY_LOG_INITIALLY_SET is not, but it's a
separate feature.  I still hope it's pretty common for the rest, e.g. vhost
should have all zeros in its init bitmap even without initial sync.

> 
> > 
> > > 
> > > 
> > > virtio-balloon() calls qemu_guest_free_page_hint() which calls
> > > 
> > > migration_clear_memory_region_dirty_bitmap_range()
> > > bitmap_clear()
> > > 
> > > So it would maybe have the same issue.
> > 
> > Should virtio-balloon do the same?
> 
> virtio-balloon is more interesting, because I assume here we could run after
> the "initial clearing" and would want to mark it clean everywhere.

Yes, if it does what I mentioned below, IIUC it'll clear all dirty bits
across the whole stack.  Only the ram_list bitmap is missing.  IIUC it
could mean it could stop working for at least tcg, as tcg sololy uses
it.. even with kvm some MRs may use it.  Maybe we want to fix it
separately.

> 
> > 
> > So I suppose the idea here is some module may want to say "we should ignore
> > these pages in the dirty bitmap", and so far that's only about migration.
> > 
> > Then cpu_physical_memory_clear_dirty_range() does look like the right thing
> > to do, in which case the bmap in ram_list used to be overlooked.. it seems.
> > 
> > But of course, cpu_physical_memory_clear_dirty_range() still doesn't cover
> > the migration bitmap itself, which is ramblock->bmap.  So we'll need to
> > switch from migration_clear_memory_region_dirty_bitmap() to use things like
> > cpu_physical_memory_clear_dirty_range(), just to cover ram_list bitmaps.
> > Then keeping the rb->bmap operations like before..
> 
> For virtio-balloon likely yes. Regarding virtio-mem, maybe "initial
> clearing" + only modifying the rb->bmap when processing discards could work
> and would even be more efficient.
> 
> (but I'm confused because we have way too many bitmaps, and maybe the KVM
> one could be problematic without an initial sync ... we'd want an initial
> clearing for that as well ...)

So IMHO most of the bitmaps should be initialized with zeros, not
ones.. like I mentioned above.

Migration bitmap is special, because it's not about dirty tracking
capability / reporting but that we know we need to migrate the first round.
So setting all ones makes sense for migration only, not a reporting
facility.  While KVM_DIRTY_LOG_INITIALLY_SET existed for its own reasoning
on speeding up migration starts..

So, now I am thinking whether we should not set all ones in ram_list bitmap
at all, corresponds to this change:

===8<===
diff --git a/system/physmem.c b/system/physmem.c
index dc1db3a384..10966fa68c 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1913,10 +1913,6 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
     ram_list.version++;
     qemu_mutex_unlock_ramlist();
 
-    cpu_physical_memory_set_dirty_range(new_block->offset,
-                                        new_block->used_length,
-                                        DIRTY_CLIENTS_ALL);
-
     if (new_block->host) {
         qemu_ram_setup_dump(new_block->host, new_block->max_length);
         qemu_madvise(new_block->host, new_block->max_length, 
QEMU_MADV_HUGEPAGE);
===8<===

I'm guessing whether above could fix the virtio-mem regression after
Hyman's current patch applied.

Said that, IMHO virtio-mem should still use the same helper just like
virtio-balloon as I discussed previously, so as to reset bitmap for the
whole stack (which seems to always be the right thing to do to not miss one
layer of them)?

Hence: 1 patch to virtio-balloon covering ram_list bitmap (which could be a
real fix to virtio-balloon on e.g. tcg?); 1 patch to virtio-mem reusing
that helper of virtio-balloon just as a cleanup to also cover all bitmaps;
1 patch like above to avoid setting ones at all in ram_list bitmap as
cleanup; then finally remove the sync() in SETUP, which is this patch.
IIUC after all these changes applied it'll work for all cases.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]