qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migra


From: Juan Quintela
Subject: Re: [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migration leads destination hung forever
Date: Wed, 24 Jul 2019 11:01:07 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)

Ivan Ren <address@hidden> wrote:
> When migrate_cancel a multifd migration, if run sequence like this:
>
>         [source]                              [destination]
>
> multifd_send_sync_main[finish]
>                                     multifd_recv_thread wait &p->sem_sync
> shutdown to_dst_file
>                                     detect error from_src_file
> send  RAM_SAVE_FLAG_EOS[fail]       [no chance to run multifd_recv_sync_main]
>                                     multifd_load_cleanup
>                                     join multifd receive thread forever
>
> will lead destination qemu hung at following stack:
>
> pthread_join
> qemu_thread_join
> multifd_load_cleanup
> process_incoming_migration_co
> coroutine_trampoline
>
> Signed-off-by: Ivan Ren <address@hidden>

I think this one is not enough.  We need to set some error code, or
disable the running bit at that point.

> ---
>  migration/ram.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index e4eb9c441f..504c8ccb03 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1291,6 +1291,11 @@ int multifd_load_cleanup(Error **errp)
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>  
>          if (p->running) {
> +            /*
> +             * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
> +             * however try to wakeup it without harm in cleanup phase.
> +             */
> +            qemu_sem_post(&p->sem_sync);
>              qemu_thread_join(&p->thread);
>          }
>          object_unref(OBJECT(p->c));

Let's see where we wait for p->sem_sync:


static void *multifd_recv_thread(void *opaque)
{
    ....
    while (true) {
        uint32_t used;
        uint32_t flags;

        ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
                                       p->packet_len, &local_err);
        .....

        if (flags & MULTIFD_FLAG_SYNC) {
            qemu_sem_post(&multifd_recv_state->sem_sync);
            qemu_sem_wait(&p->sem_sync);
        }
    }
    if (local_err) {
        multifd_recv_terminate_threads(local_err);
    }
    qemu_mutex_lock(&p->mutex);
    p->running = false;
    qemu_mutex_unlock(&p->mutex);

    rcu_unregister_thread();
    trace_multifd_recv_thread_end(p->id, p->num_packets, p->num_pages);

    return NULL;
}

If we just post it there, we get out of the wait (that bit is ok), but
then we go back to the beggining of the bucle, we (probably) got one
error on the qui_channel_read_all_eof(), and we go back to
multifd_recv_terminate_threads(), or wait there.

I think that it is better to *also* set an p->quit variable there, and
not even try to receive anything for that channel?

I will send a patch later.

Good catch.

Later, Juan.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]