qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process


From: Dr. David Alan Gilbert
Subject: Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process
Date: Thu, 22 Sep 2022 17:41:30 +0100
User-agent: Mutt/2.2.7 (2022-08-07)

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Sep 22, 2022 at 03:49:38PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > When starting ram saving procedure (especially at the completion phase),
> > > always set last_seen_block to non-NULL to make sure we can always 
> > > correctly
> > > detect the case where "we've migrated all the dirty pages".
> > > 
> > > Then we'll guarantee both last_seen_block and pss.block will be valid
> > > always before the loop starts.
> > > 
> > > See the comment in the code for some details.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > Yeh I guess it can currently only happen during restart?
> 
> There're only two places to clear last_seen_block:
> 
> ram_state_reset[2683]          rs->last_seen_block = NULL;
> ram_postcopy_send_discard_bitmap[2876] rs->last_seen_block = NULL;
> 
> Where for the reset case:
> 
> ram_state_init[2994]           ram_state_reset(*rsp);
> ram_state_resume_prepare[3110] ram_state_reset(rs);
> ram_save_iterate[3271]         ram_state_reset(rs);
> 
> So I think it can at least happen in two places, either (1) postcopy just
> started (assume when postcopy starts accidentally when all dirty pages were
> migrated?), or (2) postcopy recover from failure.

Oh, (1) is a more general problem then; yeh.

> In my case I triggered this deadloop when I was debugging the other bug
> fixed by the next patch where it was postcopy recovery (on tls), but only
> once..  So currently I'm still not 100% sure whether this is the same
> problem, but logically it could trigger.
> 
> I also remember I used to hit very rare deadloops before too, maybe they're
> the same thing because I did test recovery a lot.

Note; 'deadlock' not 'deadloop'.

Dave

> > 
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Thanks!
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]