[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process
From: |
Peter Xu |
Subject: |
Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process |
Date: |
Thu, 22 Sep 2022 11:25:38 -0400 |
On Thu, Sep 22, 2022 at 03:49:38PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > When starting ram saving procedure (especially at the completion phase),
> > always set last_seen_block to non-NULL to make sure we can always correctly
> > detect the case where "we've migrated all the dirty pages".
> >
> > Then we'll guarantee both last_seen_block and pss.block will be valid
> > always before the loop starts.
> >
> > See the comment in the code for some details.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
>
> Yeh I guess it can currently only happen during restart?
There're only two places to clear last_seen_block:
ram_state_reset[2683] rs->last_seen_block = NULL;
ram_postcopy_send_discard_bitmap[2876] rs->last_seen_block = NULL;
Where for the reset case:
ram_state_init[2994] ram_state_reset(*rsp);
ram_state_resume_prepare[3110] ram_state_reset(rs);
ram_save_iterate[3271] ram_state_reset(rs);
So I think it can at least happen in two places, either (1) postcopy just
started (assume when postcopy starts accidentally when all dirty pages were
migrated?), or (2) postcopy recover from failure.
In my case I triggered this deadloop when I was debugging the other bug
fixed by the next patch where it was postcopy recovery (on tls), but only
once.. So currently I'm still not 100% sure whether this is the same
problem, but logically it could trigger.
I also remember I used to hit very rare deadloops before too, maybe they're
the same thing because I did test recovery a lot.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Thanks!
--
Peter Xu
- [PATCH 0/5] migration: Bug fixes (prepare for preempt-full), Peter Xu, 2022/09/20
- [PATCH 4/5] migration: Disallow postcopy preempt to be used with compress, Peter Xu, 2022/09/20
- [PATCH 5/5] migration: Use non-atomic ops for clear log bitmap, Peter Xu, 2022/09/20
- [PATCH 3/5] migration: Disallow xbzrle with postcopy, Peter Xu, 2022/09/20
- [PATCH 2/5] migration: Fix race on qemu_file_shutdown(), Peter Xu, 2022/09/20
- [PATCH 1/5] migration: Fix possible deadloop of ram save process, Peter Xu, 2022/09/20