Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process

From:	Peter Xu
Subject:	Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process
Date:	Thu, 22 Sep 2022 11:25:38 -0400

On Thu, Sep 22, 2022 at 03:49:38PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > When starting ram saving procedure (especially at the completion phase),
> > always set last_seen_block to non-NULL to make sure we can always correctly
> > detect the case where "we've migrated all the dirty pages".
> > 
> > Then we'll guarantee both last_seen_block and pss.block will be valid
> > always before the loop starts.
> > 
> > See the comment in the code for some details.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Yeh I guess it can currently only happen during restart?

There're only two places to clear last_seen_block:

ram_state_reset[2683]          rs->last_seen_block = NULL;
ram_postcopy_send_discard_bitmap[2876] rs->last_seen_block = NULL;

Where for the reset case:

ram_state_init[2994]           ram_state_reset(*rsp);
ram_state_resume_prepare[3110] ram_state_reset(rs);
ram_save_iterate[3271]         ram_state_reset(rs);

So I think it can at least happen in two places, either (1) postcopy just
started (assume when postcopy starts accidentally when all dirty pages were
migrated?), or (2) postcopy recover from failure.

In my case I triggered this deadloop when I was debugging the other bug
fixed by the next patch where it was postcopy recovery (on tls), but only
once..  So currently I'm still not 100% sure whether this is the same
problem, but logically it could trigger.

I also remember I used to hit very rare deadloops before too, maybe they're
the same thing because I did test recovery a lot.

> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 0/5] migration: Bug fixes (prepare for preempt-full), Peter Xu, 2022/09/20
- [PATCH 4/5] migration: Disallow postcopy preempt to be used with compress, Peter Xu, 2022/09/20
  - Re: [PATCH 4/5] migration: Disallow postcopy preempt to be used with compress, Dr. David Alan Gilbert, 2022/09/22
- [PATCH 5/5] migration: Use non-atomic ops for clear log bitmap, Peter Xu, 2022/09/20
- [PATCH 3/5] migration: Disallow xbzrle with postcopy, Peter Xu, 2022/09/20
  - Re: [PATCH 3/5] migration: Disallow xbzrle with postcopy, Dr. David Alan Gilbert, 2022/09/22
- [PATCH 2/5] migration: Fix race on qemu_file_shutdown(), Peter Xu, 2022/09/20
  - Re: [PATCH 2/5] migration: Fix race on qemu_file_shutdown(), Dr. David Alan Gilbert, 2022/09/22
- [PATCH 1/5] migration: Fix possible deadloop of ram save process, Peter Xu, 2022/09/20
  - Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process, Dr. David Alan Gilbert, 2022/09/22
    - Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process, Peter Xu <=
    - Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process, Dr. David Alan Gilbert, 2022/09/22

Prev by Date: Re: [PATCH v4 2/3] module: add Error arguments to module_load_one and module_load_qom_one
Next by Date: Re: [PATCH v4 2/3] module: add Error arguments to module_load_one and module_load_qom_one
Previous by thread: Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process
Next by thread: Re: [PATCH 1/5] migration: Fix possible deadloop of ram save process
Index(es):
- Date
- Thread