[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: QEMU migration-test CI intermittent failure
From: |
Fabiano Rosas |
Subject: |
Re: QEMU migration-test CI intermittent failure |
Date: |
Mon, 18 Sep 2023 11:15:57 -0300 |
Peter Xu <peterx@redhat.com> writes:
> On Fri, Sep 15, 2023 at 12:28:36PM -0400, Peter Xu wrote:
>> On Thu, Sep 14, 2023 at 10:56:23PM -0300, Fabiano Rosas wrote:
>> > Peter Xu <peterx@redhat.com> writes:
>> >
>> > > On Thu, Sep 14, 2023 at 07:54:17PM -0300, Fabiano Rosas wrote:
>> > >> Fabiano Rosas <farosas@suse.de> writes:
>> > >>
>> > >> > Peter Xu <peterx@redhat.com> writes:
>> > >> >
>> > >> >> On Thu, Sep 14, 2023 at 12:57:08PM -0300, Fabiano Rosas wrote:
>> > >> >>> I managed to reproduce it. It's not the return path error. In
>> > >> >>> hindsight
>> > >> >>> that's obvious because that error happens in the 'recovery' test
>> > >> >>> and this
>> > >> >>> one in the 'plain' one. Sorry about the noise.
>> > >> >>
>> > >> >> No worry. It's good to finally identify that.
>> > >> >>
>> > >> >>>
>> > >> >>> This one reproduced with just 4 iterations of preempt/plain. I'll
>> > >> >>> investigate.
>> > >> >
>> > >> > It seems that we're getting a tcp disconnect (ECONNRESET) on when
>> > >> > doing
>> > >> > that shutdown() on postcopy_qemufile_src. The one from commit
>> > >> > 6621883f93
>> > >> > ("migration: Fix potential race on postcopy_qemufile_src").
>> > >> >
>> > >> > I'm trying to determine why that happens when other times it just
>> > >> > returns 0 as expected.
>> > >> >
>> > >> > Could this mean that we're kicking the dest too soon while it is still
>> > >> > receiving valid data?
>> > >>
>> > >> Looking a bit more into this, what's happening is that
>> > >> postcopy_ram_incoming_cleanup() is shutting the postcopy_qemufile_dst
>> > >> while ram_load_postcopy() is still running.
>> > >>
>> > >> The postcopy_ram_listen_thread() function waits for the
>> > >> main_thread_load_event, but that only works when not using preempt. With
>> > >> the preempt thread, the event is set right away and we proceed to do the
>> > >> cleanup without waiting.
>> > >>
>> > >> So the assumption of commit 6621883f93 that the incoming side knows when
>> > >> it has finished migrating is wrong IMO. Without the EOS we're relying on
>> > >> the chance that the shutdown() happens after the last recvmsg has
>> > >> returned and not during it.
>> > >>
>> > >> Peter, what do you think?
>> > >
>> > > That's a good point.
>> > >
>> > > One thing to verify that (sorry, I still cannot reproduce it myself,
>> > > which
>> > > is so weirdly... it seems loads won't really help reproducing this) is to
>> > > let the main thread wait for all requested pages to arrive:
>> > >
>> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> > > index 29aea9456d..df055c51ea 100644
>> > > --- a/migration/postcopy-ram.c
>> > > +++ b/migration/postcopy-ram.c
>> > > @@ -597,6 +597,12 @@ int
>> > > postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>> > > trace_postcopy_ram_incoming_cleanup_entry();
>> > >
>> > > if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
>> > > + /*
>> > > + * NOTE! it's possible that the preempt thread is still
>> > > handling
>> > > + * the last pages to arrive which were requested by faults.
>> > > Making
>> > > + * sure nothing is left behind.
>> > > + */
>> > > + while (qatomic_read(&mis->page_requested_count));
>> > > /* Notify the fast load thread to quit */
>> > > mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
>> > > if (mis->postcopy_qemufile_dst) {
>> > >
>> > > If that can work solidly, we can figure out a better way than a dead loop
>> > > here.
>> >
>> > Yep, 2000+ iterations so far and no error.
>>
>> Definitely good news.. thanks.
>>
>> >
>> > Should we add something that makes ram_load_postcopy return once it's
>> > finished? Then this code could just set PREEMPT_THREAD_QUIT and join the
>> > preempt thread.
>>
>> Yeah it'll be nice to have that. We used to have an EOS message at the end
>> for the preempt channel, but that used to cause another race (which IIRC I
>> could easily reproduce at that time) and we removed it after
>> 6621883f9398bc3 for 7.2+.
>>
>> Now instead of fiddling with the protocol again, we do have a way to detect
>> that, but it may need some trick on counting on page_requested_count and
>> also provide a signal mechanism.
>>
>> I've drafted one solution here and attached here as a complete patch, feel
>> free to try. If you have anything better, that'll be even nicer; just let
>> me know your thoughts.
>>
>> ---8<---
>> From 384aff2264650872e15916dcfeaec593e5e9b781 Mon Sep 17 00:00:00 2001
>> From: Peter Xu <peterx@redhat.com>
>> Date: Fri, 15 Sep 2023 12:11:34 -0400
>> Subject: [PATCH] migration: Fix race that dest preempt thread close too early
>>
>> We hit intermit CI issue on failing at migration-test over the unit test
>> preempt/plain:
>>
>> qemu-system-x86_64: Unable to read from socket: Connection reset by peer
>> Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc
>> current = 4f hit_edge = 1
>> **
>> ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion
>> failed: (bad == 0)
>> (test program exited with status code -6)
>>
>> Fabiano debugged into it and found that the preempt thread can quit even
>> without receiving all the pages, which can cause guest not receiving all
>> the pages and corrupt the guest memory.
>>
>> To make sure preempt thread finished receiving all the pages, we can rely
>> on the page_requested_count being zero because preempt channel will only
>> receive requested page faults. Note, not all the faulted pages are required
>> to be sent via the preempt channel/thread; imagine the case when a
>> requested page is just queued into the background main channel for
>> migration, the src qemu will just still send it via the background channel.
>>
>> Here instead of spinning over reading the count, we add a condvar so the
>> main thread can wait on it if that unusual case happened, without burning
>> the cpu for no good reason, even if the duration is short; so even if we
>> spin in this rare case is probably fine. It's just better to not do so.
>>
>> The condvar is only used when that special case is triggered. Some memory
>> ordering trick is needed to guarantee it from happening (against the
>> preempt thread status field), so the main thread will always get a kick
>> when that triggers correctly.
>>
>> Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
>> Debugged-by: Fabiano Rosas <farosas@suse.de>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>> migration/migration.h | 13 ++++++++++++-
>> migration/migration.c | 1 +
>> migration/postcopy-ram.c | 35 +++++++++++++++++++++++++++++++++++
>> 3 files changed, 48 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/migration.h b/migration/migration.h
>> index c390500604..cdaa10d515 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -196,7 +196,10 @@ struct MigrationIncomingState {
>>
>> /* A tree of pages that we requested to the source VM */
>> GTree *page_requested;
>> - /* For debugging purpose only, but would be nice to keep */
>> + /*
>> + * For postcopy only, count the number of requested page faults that
>> + * still haven't been resolved.
>> + */
>> int page_requested_count;
>> /*
>> * The mutex helps to maintain the requested pages that we sent to the
>> @@ -210,6 +213,14 @@ struct MigrationIncomingState {
>> * contains valid information.
>> */
>> QemuMutex page_request_mutex;
>> + /*
>> + * If postcopy preempt is enabled, there is a chance that the main
>> + * thread finished loading its data before the preempt channel has
>> + * finished loading the urgent pages. If that happens, the two threads
>> + * will use this condvar to synchronize, so the main thread will always
>> + * wait until all pages received.
>> + */
>> + QemuCond page_request_cond;
>>
>> /*
>> * Number of devices that have yet to approve switchover. When this
>> reaches
>> diff --git a/migration/migration.c b/migration/migration.c
>> index d61e572742..e86b3f7368 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -153,6 +153,7 @@ void migration_object_init(void)
>> qemu_sem_init(¤t_incoming->postcopy_qemufile_dst_done, 0);
>>
>> qemu_mutex_init(¤t_incoming->page_request_mutex);
>> + qemu_cond_init(¤t_incoming->page_request_cond);
>> current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>>
>> migration_object_check(current_migration, &error_fatal);
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index 29aea9456d..dbf02cd3ed 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -599,6 +599,30 @@ int
>> postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>> if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
>> /* Notify the fast load thread to quit */
>> mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
>> + /*
>> + * Update preempt_thread_status before reading count. Note: mutex
>> + * lock only provide ACQUIRE semantic, and it doesn't stops this
>> + * write to be reordered after reading the count.
>> + */
>> + smp_mb();
>> + /*
>> + * It's possible that the preempt thread is still handling the last
>> + * pages to arrive which were requested by guest page faults.
>> + * Making sure nothing is left behind by waiting on the condvar if
>> + * that unlikely case happened.
>> + */
>> + WITH_QEMU_LOCK_GUARD(&mis->page_request_mutex) {
>> + if (qatomic_read(&mis->page_requested_count)) {
>> + /*
>> + * It is guaranteed to receive a signal later, because the
>> + * count>0 now, so it's destined to be decreased to zero
>> + * very soon by the preempt thread.
>> + */
>> + qemu_cond_wait(&mis->page_request_cond,
>> + &mis->page_request_mutex);
>> + }
>> + }
>> + /* Notify the fast load thread to quit */
>> if (mis->postcopy_qemufile_dst) {
>> qemu_file_shutdown(mis->postcopy_qemufile_dst);
>> }
>> @@ -1279,6 +1303,17 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingState
>> *mis, void *host_addr,
>> g_tree_remove(mis->page_requested, host_addr);
>> mis->page_requested_count--;
>> trace_postcopy_page_req_del(host_addr,
>> mis->page_requested_count);
>> + /* Order the update of count and read of preempt status */
>> + smp_mb();
>> + if (qatomic_read(&mis->preempt_thread_status) ==
>> + PREEMPT_THREAD_QUIT) {
>> + /*
>> + * This probably means the main thread is waiting for us.
>> + * Notify that we've finished receiving the last requested
>> + * page.
>> + */
>> + qemu_cond_signal(&mis->page_request_cond);
>> + }
>> }
>> qemu_mutex_unlock(&mis->page_request_mutex);
>> mark_postcopy_blocktime_end((uintptr_t)host_addr);
>> --
>> 2.41.0
>
> I just made some changes on top of this patch..
>
> One major thing is to double check we only kick main thread only if it's
> the last page, as I think there can be >1 pages on the fly..
>
> Since at it, I cleaned up a bit on accessing either the preempt status
> field or the counter, to make it always consistent (always not use atomic
> ops for status as it's defined as volatile anyway; meanwhile always use
> qatomic for the counter).
>
> ---8<---
> From bfdd1f872fbeafe26ebad280a6d9e5a88de621de Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Fri, 15 Sep 2023 12:45:01 -0400
> Subject: [PATCH] fixup! migration: Fix race that dest preempt thread close too
> early
>
> Three more small changes:
>
> - Always use qatomic_* for page_requested_count accesses
> - only kick main thread when we're sure it's the last page
> - drop qatomic_read() on reading preempt status, to make it match use
> cases elsewhere, not needed when defined as volatile already
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/migration.c | 2 +-
> migration/postcopy-ram.c | 7 ++++---
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index e86b3f7368..3ee1e6b0d6 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -368,7 +368,7 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis,
> * things like g_tree_lookup() will return TRUE (1) when found.
> */
> g_tree_insert(mis->page_requested, aligned, (gpointer)1);
> - mis->page_requested_count++;
> + qatomic_inc(&mis->page_requested_count);
> trace_postcopy_page_req_add(aligned, mis->page_requested_count);
> }
> }
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index dbf02cd3ed..5408e028c6 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1301,12 +1301,13 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingState
> *mis, void *host_addr,
> */
> if (g_tree_lookup(mis->page_requested, host_addr)) {
> g_tree_remove(mis->page_requested, host_addr);
> - mis->page_requested_count--;
> + int left_pages = qatomic_dec_fetch(&mis->page_requested_count);
> +
> trace_postcopy_page_req_del(host_addr,
> mis->page_requested_count);
> /* Order the update of count and read of preempt status */
> smp_mb();
> - if (qatomic_read(&mis->preempt_thread_status) ==
> - PREEMPT_THREAD_QUIT) {
> + if (mis->preempt_thread_status == PREEMPT_THREAD_QUIT &&
> + left_pages == 0) {
> /*
> * This probably means the main thread is waiting for us.
> * Notify that we've finished receiving the last requested
> --
> 2.41.0
This fixes the issue. It looks ok to me. Do you want me to send a separate
patch with it?
- Re: QEMU migration-test CI intermittent failure, (continued)
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Fabiano Rosas, 2023/09/14
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/15
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/15
- Re: QEMU migration-test CI intermittent failure,
Fabiano Rosas <=
- Re: QEMU migration-test CI intermittent failure, Peter Xu, 2023/09/18