[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: QEMU migration-test CI intermittent failure
From: |
Peter Xu |
Subject: |
Re: QEMU migration-test CI intermittent failure |
Date: |
Fri, 15 Sep 2023 12:55:22 -0400 |
On Fri, Sep 15, 2023 at 12:28:36PM -0400, Peter Xu wrote:
> On Thu, Sep 14, 2023 at 10:56:23PM -0300, Fabiano Rosas wrote:
> > Peter Xu <peterx@redhat.com> writes:
> >
> > > On Thu, Sep 14, 2023 at 07:54:17PM -0300, Fabiano Rosas wrote:
> > >> Fabiano Rosas <farosas@suse.de> writes:
> > >>
> > >> > Peter Xu <peterx@redhat.com> writes:
> > >> >
> > >> >> On Thu, Sep 14, 2023 at 12:57:08PM -0300, Fabiano Rosas wrote:
> > >> >>> I managed to reproduce it. It's not the return path error. In
> > >> >>> hindsight
> > >> >>> that's obvious because that error happens in the 'recovery' test and
> > >> >>> this
> > >> >>> one in the 'plain' one. Sorry about the noise.
> > >> >>
> > >> >> No worry. It's good to finally identify that.
> > >> >>
> > >> >>>
> > >> >>> This one reproduced with just 4 iterations of preempt/plain. I'll
> > >> >>> investigate.
> > >> >
> > >> > It seems that we're getting a tcp disconnect (ECONNRESET) on when doing
> > >> > that shutdown() on postcopy_qemufile_src. The one from commit
> > >> > 6621883f93
> > >> > ("migration: Fix potential race on postcopy_qemufile_src").
> > >> >
> > >> > I'm trying to determine why that happens when other times it just
> > >> > returns 0 as expected.
> > >> >
> > >> > Could this mean that we're kicking the dest too soon while it is still
> > >> > receiving valid data?
> > >>
> > >> Looking a bit more into this, what's happening is that
> > >> postcopy_ram_incoming_cleanup() is shutting the postcopy_qemufile_dst
> > >> while ram_load_postcopy() is still running.
> > >>
> > >> The postcopy_ram_listen_thread() function waits for the
> > >> main_thread_load_event, but that only works when not using preempt. With
> > >> the preempt thread, the event is set right away and we proceed to do the
> > >> cleanup without waiting.
> > >>
> > >> So the assumption of commit 6621883f93 that the incoming side knows when
> > >> it has finished migrating is wrong IMO. Without the EOS we're relying on
> > >> the chance that the shutdown() happens after the last recvmsg has
> > >> returned and not during it.
> > >>
> > >> Peter, what do you think?
> > >
> > > That's a good point.
> > >
> > > One thing to verify that (sorry, I still cannot reproduce it myself, which
> > > is so weirdly... it seems loads won't really help reproducing this) is to
> > > let the main thread wait for all requested pages to arrive:
> > >
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index 29aea9456d..df055c51ea 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -597,6 +597,12 @@ int
> > > postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> > > trace_postcopy_ram_incoming_cleanup_entry();
> > >
> > > if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
> > > + /*
> > > + * NOTE! it's possible that the preempt thread is still handling
> > > + * the last pages to arrive which were requested by faults.
> > > Making
> > > + * sure nothing is left behind.
> > > + */
> > > + while (qatomic_read(&mis->page_requested_count));
> > > /* Notify the fast load thread to quit */
> > > mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
> > > if (mis->postcopy_qemufile_dst) {
> > >
> > > If that can work solidly, we can figure out a better way than a dead loop
> > > here.
> >
> > Yep, 2000+ iterations so far and no error.
>
> Definitely good news.. thanks.
>
> >
> > Should we add something that makes ram_load_postcopy return once it's
> > finished? Then this code could just set PREEMPT_THREAD_QUIT and join the
> > preempt thread.
>
> Yeah it'll be nice to have that. We used to have an EOS message at the end
> for the preempt channel, but that used to cause another race (which IIRC I
> could easily reproduce at that time) and we removed it after
> 6621883f9398bc3 for 7.2+.
>
> Now instead of fiddling with the protocol again, we do have a way to detect
> that, but it may need some trick on counting on page_requested_count and
> also provide a signal mechanism.
>
> I've drafted one solution here and attached here as a complete patch, feel
> free to try. If you have anything better, that'll be even nicer; just let
> me know your thoughts.
>
> ---8<---
> From 384aff2264650872e15916dcfeaec593e5e9b781 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Fri, 15 Sep 2023 12:11:34 -0400
> Subject: [PATCH] migration: Fix race that dest preempt thread close too early
>
> We hit intermit CI issue on failing at migration-test over the unit test
> preempt/plain:
>
> qemu-system-x86_64: Unable to read from socket: Connection reset by peer
> Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc
> current = 4f hit_edge = 1
> **
> ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed:
> (bad == 0)
> (test program exited with status code -6)
>
> Fabiano debugged into it and found that the preempt thread can quit even
> without receiving all the pages, which can cause guest not receiving all
> the pages and corrupt the guest memory.
>
> To make sure preempt thread finished receiving all the pages, we can rely
> on the page_requested_count being zero because preempt channel will only
> receive requested page faults. Note, not all the faulted pages are required
> to be sent via the preempt channel/thread; imagine the case when a
> requested page is just queued into the background main channel for
> migration, the src qemu will just still send it via the background channel.
>
> Here instead of spinning over reading the count, we add a condvar so the
> main thread can wait on it if that unusual case happened, without burning
> the cpu for no good reason, even if the duration is short; so even if we
> spin in this rare case is probably fine. It's just better to not do so.
>
> The condvar is only used when that special case is triggered. Some memory
> ordering trick is needed to guarantee it from happening (against the
> preempt thread status field), so the main thread will always get a kick
> when that triggers correctly.
>
> Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
> Debugged-by: Fabiano Rosas <farosas@suse.de>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/migration.h | 13 ++++++++++++-
> migration/migration.c | 1 +
> migration/postcopy-ram.c | 35 +++++++++++++++++++++++++++++++++++
> 3 files changed, 48 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration.h b/migration/migration.h
> index c390500604..cdaa10d515 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -196,7 +196,10 @@ struct MigrationIncomingState {
>
> /* A tree of pages that we requested to the source VM */
> GTree *page_requested;
> - /* For debugging purpose only, but would be nice to keep */
> + /*
> + * For postcopy only, count the number of requested page faults that
> + * still haven't been resolved.
> + */
> int page_requested_count;
> /*
> * The mutex helps to maintain the requested pages that we sent to the
> @@ -210,6 +213,14 @@ struct MigrationIncomingState {
> * contains valid information.
> */
> QemuMutex page_request_mutex;
> + /*
> + * If postcopy preempt is enabled, there is a chance that the main
> + * thread finished loading its data before the preempt channel has
> + * finished loading the urgent pages. If that happens, the two threads
> + * will use this condvar to synchronize, so the main thread will always
> + * wait until all pages received.
> + */
> + QemuCond page_request_cond;
>
> /*
> * Number of devices that have yet to approve switchover. When this
> reaches
> diff --git a/migration/migration.c b/migration/migration.c
> index d61e572742..e86b3f7368 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -153,6 +153,7 @@ void migration_object_init(void)
> qemu_sem_init(¤t_incoming->postcopy_qemufile_dst_done, 0);
>
> qemu_mutex_init(¤t_incoming->page_request_mutex);
> + qemu_cond_init(¤t_incoming->page_request_cond);
> current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>
> migration_object_check(current_migration, &error_fatal);
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 29aea9456d..dbf02cd3ed 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -599,6 +599,30 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState
> *mis)
> if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
> /* Notify the fast load thread to quit */
> mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
> + /*
> + * Update preempt_thread_status before reading count. Note: mutex
> + * lock only provide ACQUIRE semantic, and it doesn't stops this
> + * write to be reordered after reading the count.
> + */
> + smp_mb();
> + /*
> + * It's possible that the preempt thread is still handling the last
> + * pages to arrive which were requested by guest page faults.
> + * Making sure nothing is left behind by waiting on the condvar if
> + * that unlikely case happened.
> + */
> + WITH_QEMU_LOCK_GUARD(&mis->page_request_mutex) {
> + if (qatomic_read(&mis->page_requested_count)) {
> + /*
> + * It is guaranteed to receive a signal later, because the
> + * count>0 now, so it's destined to be decreased to zero
> + * very soon by the preempt thread.
> + */
> + qemu_cond_wait(&mis->page_request_cond,
> + &mis->page_request_mutex);
> + }
> + }
> + /* Notify the fast load thread to quit */
> if (mis->postcopy_qemufile_dst) {
> qemu_file_shutdown(mis->postcopy_qemufile_dst);
> }
> @@ -1279,6 +1303,17 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingState
> *mis, void *host_addr,
> g_tree_remove(mis->page_requested, host_addr);
> mis->page_requested_count--;
> trace_postcopy_page_req_del(host_addr,
> mis->page_requested_count);
> + /* Order the update of count and read of preempt status */
> + smp_mb();
> + if (qatomic_read(&mis->preempt_thread_status) ==
> + PREEMPT_THREAD_QUIT) {
> + /*
> + * This probably means the main thread is waiting for us.
> + * Notify that we've finished receiving the last requested
> + * page.
> + */
> + qemu_cond_signal(&mis->page_request_cond);
> + }
> }
> qemu_mutex_unlock(&mis->page_request_mutex);
> mark_postcopy_blocktime_end((uintptr_t)host_addr);
> --
> 2.41.0
I just made some changes on top of this patch..
One major thing is to double check we only kick main thread only if it's
the last page, as I think there can be >1 pages on the fly..
Since at it, I cleaned up a bit on accessing either the preempt status
field or the counter, to make it always consistent (always not use atomic
ops for status as it's defined as volatile anyway; meanwhile always use
qatomic for the counter).
---8<---
>From bfdd1f872fbeafe26ebad280a6d9e5a88de621de Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Fri, 15 Sep 2023 12:45:01 -0400
Subject: [PATCH] fixup! migration: Fix race that dest preempt thread close too
early
Three more small changes:
- Always use qatomic_* for page_requested_count accesses
- only kick main thread when we're sure it's the last page
- drop qatomic_read() on reading preempt status, to make it match use
cases elsewhere, not needed when defined as volatile already
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 2 +-
migration/postcopy-ram.c | 7 ++++---
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index e86b3f7368..3ee1e6b0d6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -368,7 +368,7 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis,
* things like g_tree_lookup() will return TRUE (1) when found.
*/
g_tree_insert(mis->page_requested, aligned, (gpointer)1);
- mis->page_requested_count++;
+ qatomic_inc(&mis->page_requested_count);
trace_postcopy_page_req_add(aligned, mis->page_requested_count);
}
}
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index dbf02cd3ed..5408e028c6 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1301,12 +1301,13 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingState
*mis, void *host_addr,
*/
if (g_tree_lookup(mis->page_requested, host_addr)) {
g_tree_remove(mis->page_requested, host_addr);
- mis->page_requested_count--;
+ int left_pages = qatomic_dec_fetch(&mis->page_requested_count);
+
trace_postcopy_page_req_del(host_addr, mis->page_requested_count);
/* Order the update of count and read of preempt status */
smp_mb();
- if (qatomic_read(&mis->preempt_thread_status) ==
- PREEMPT_THREAD_QUIT) {
+ if (mis->preempt_thread_status == PREEMPT_THREAD_QUIT &&
+ left_pages == 0) {
/*
* This probably means the main thread is waiting for us.
* Notify that we've finished receiving the last requested
--
2.41.0
--
Peter Xu
- Re: QEMU migration-test CI intermittent failure, (continued)
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/15
- Re: QEMU migration-test CI intermittent failure,
Peter Xu <=
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/18
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/18