qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs


From: Fabiano Rosas
Subject: Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER
Date: Thu, 05 Oct 2023 10:37:56 -0300

Fabiano Rosas <farosas@suse.de> writes:

> Peter Xu <peterx@redhat.com> writes:
>
>> From: Fabiano Rosas <farosas@suse.de>
>>
>> To do so, create two paired sockets, but make them not providing real data.
>> Feed those fake sockets to src/dst QEMUs for recovery to let them go into
>> RECOVER stage without going out.  Test that we can always kick it out and
>> recover again with the right ports.
>>
>> This patch is based on Fabiano's version here:
>>
>> https://lore.kernel.org/r/877cowmdu0.fsf@suse.de
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> [peterx: write commit message, remove case 1, fix bugs, and more]
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>>  tests/qtest/migration-test.c | 94 ++++++++++++++++++++++++++++++++++++
>>  1 file changed, 94 insertions(+)
>>
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index 46f1c275a2..fb7a3765e4 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -729,6 +729,7 @@ typedef struct {
>>      /* Postcopy specific fields */
>>      void *postcopy_data;
>>      bool postcopy_preempt;
>> +    bool postcopy_recovery_test_fail;
>>  } MigrateCommon;
>>  
>>  static int test_migrate_start(QTestState **from, QTestState **to,
>> @@ -1381,6 +1382,78 @@ static void test_postcopy_preempt_tls_psk(void)
>>  }
>>  #endif
>>  
>> +static void wait_for_postcopy_status(QTestState *one, const char *status)
>> +{
>> +    wait_for_migration_status(one, status,
>> +                              (const char * []) { "failed", "active",
>> +                                                  "completed", NULL });
>> +}
>> +
>> +static void postcopy_recover_fail(QTestState *from, QTestState *to)
>> +{
>> +    int ret, pair1[2], pair2[2];
>> +    char c;
>> +
>> +    /* Create two unrelated socketpairs */
>> +    ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1);
>> +    g_assert_cmpint(ret, ==, 0);
>> +
>> +    ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2);
>> +    g_assert_cmpint(ret, ==, 0);
>> +
>> +    /*
>> +     * Give the guests unpaired ends of the sockets, so they'll all blocked
>> +     * at reading.  This mimics a wrong channel established.
>> +     */
>> +    qtest_qmp_fds_assert_success(from, &pair1[0], 1,
>> +                                 "{ 'execute': 'getfd',"
>> +                                 "  'arguments': { 'fdname': 'fd-mig' }}");
>> +    qtest_qmp_fds_assert_success(to, &pair2[0], 1,
>> +                                 "{ 'execute': 'getfd',"
>> +                                 "  'arguments': { 'fdname': 'fd-mig' }}");
>> +
>> +    /*
>> +     * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to
>> +     * emulate the 1st byte of a real recovery, but stops from there to
>> +     * keep dest QEMU in RECOVER.  This is needed so that we can kick off
>> +     * the recover process on dest QEMU (by triggering the G_IO_IN event).
>> +     *
>> +     * NOTE: this trick is not needed on src QEMUs, because src doesn't
>> +     * rely on an pre-existing G_IO_IN event, so it will always trigger the
>> +     * upcoming recovery anyway even if it can read nothing.
>> +     */
>> +#define QEMU_VM_COMMAND              0x08
>> +    c = QEMU_VM_COMMAND;
>> +    ret = send(pair2[1], &c, 1, 0);
>> +    g_assert_cmpint(ret, ==, 1);
>> +
>> +    migrate_recover(to, "fd:fd-mig");
>> +    migrate_qmp(from, "fd:fd-mig", "{'resume': true}");
>> +
>> +    /*
>> +     * Make sure both QEMU instances will go into RECOVER stage, then test
>> +     * kicking them out using migrate-pause.
>> +     */
>> +    wait_for_postcopy_status(from, "postcopy-recover");
>> +    wait_for_postcopy_status(to, "postcopy-recover");
>
> Is this wait out of place? I think we're trying to resume too fast after
> migrate_recover():
>
> # {
> #     "error": {
> #         "class": "GenericError",
> #         "desc": "Cannot resume if there is no paused migration"
> #     }
> # }
>

Ugh, sorry about the long lines:

{
    "error": {
        "class": "GenericError",
        "desc": "Cannot resume if there is no paused migration"
    }
}




reply via email to

[Prev in Thread] Current Thread [Next in Thread]