qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 12/42] migration-test: Enable back ignore-shared test


From: Juan Quintela
Subject: Re: [PATCH 12/42] migration-test: Enable back ignore-shared test
Date: Wed, 21 Jun 2023 23:53:41 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Peter Xu <peterx@redhat.com> wrote:
> On Wed, Jun 21, 2023 at 09:38:08PM +0200, Juan Quintela wrote:
>> Peter Xu <peterx@redhat.com> wrote:
>> > On Fri, Jun 09, 2023 at 12:49:13AM +0200, Juan Quintela wrote:
>> >> It failed on aarch64 tcg, lets see if that is still the case.
>> >> 
>> >> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> >
>> > According to the history:
>> >
>> > https://lore.kernel.org/all/20190305180635.GA3803@work-vm/
>> >
>> > It's never enabled, and not sure whether Yury followed it up.  Juan: have
>> > you tried it out on aarch64 before enabling it again?  I assume we rely on
>> > the previous patch but that doesn't even sound like aarch64 specific.  I
>> > worry it'll just keep failing on aarch64.
>> 
>> Hi
>> 
>> I am resending this series.
>> 
>> I hard tested this time.  x86_64 host.
>> Two build directories:
>> - x86_64 (I just build qemu-system-x86_64, kvm)
>> - aarch64 (I just build qemu-system-aarch64, tcg)
>> 
>> Everything is run as:
>> 
>> while true; do $command || break; done
>> 
>> And run this:
>> - x86_64:
>>   * make check (nit: you can't run two make checks on the same
>>     directory)
>>   * 4 ./test/qtest/migration-test
>>   * 2 ./test/qtest/migration-test -p ./tests/qtest/migration-test -p 
>> /x86_64/migration/multifd/tcp/plain/cancel
>>   * 2 ./test/qtest/migration-test -p ./tests/qtest/migration-test -p 
>> /x86_64/migration/ignore_shared
>> 
>> - aarch64:
>>   The same with s/x86_64/aarch64/
>> 
>> And left it running for 6 hours.  No errors.
>> Machine has enough RAM for running this (128GB) and 18 cores (intel
>> i9900K).
>> Load of the machine while running this tests is around 50 (I really hope
>> that our CI hosts have less load).
>> 
>> A run master with the same configuration.  In less than 10 minutes I get
>> the dreaded:
>> 
>> # starting QEMU: exec ./qemu-system-aarch64 -qtest 
>> unix:/tmp/qtest-3264370.sock -qtest-log /dev/null -chardev 
>> socket,path=/tmp/qtest-3264370.qmp,id=char0 -mon chardev=char0,mode=control 
>> -display none -accel kvm -accel tcg -machine virt,gic-version=max -name 
>> target,debug-threads=on -m 150M -serial 
>> file:/tmp/migration-test-1A1461/dest_serial -incoming defer -cpu max -kernel 
>> /tmp/migration-test-1A1461/bootsect    -accel qtest
>> Broken pipe
>> ../../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:195: kill_qemu() 
>> detected QEMU death from signal 6 (Aborted) (core dumped)
>> Aborted (core dumped)
>> $
>> 
>> On multifd+cancel.
>> 
>> I have no been able to ever get ignore_shared to fail on my machine.
>> But I didn't tested aarch64 TCG in the past so hard, and in x86_64 it
>> has always worked for me.
>
> Thanks a lot, Juan.
>
> Do you mean master is broken with QEMU_TEST_FLAKY_TESTS=1?

Yeap.  I mean multifd+cancel.  That is the reason why we put the FLAKY
part.

> And after the
> whole series applied we cannot trigger issue in the few hours test even
> with it?

Yeap.

> Shall we wait for another 1-2 days to see whether Yury would comment
> (before you repost)?  Otherwise I agree if it survives your few-hours test
> we should give it a try - at least according to Dave's comment before it
> was failing easily, but it is not now on the test bed.

>From the v2 series that I am about to post:

    migration-test: Re-enable multifd_cancel test

    Why?
    - migration/multifd: Protect accesses to migration_threads
      this patch fixed the problem about memory corruption
    - migration-test: Move serial to GuestState
      now we are using guest name as serial file name
      In the past there was a conflict between vm "to" and "to2" that used
      the same file name.
    - migration-test: Wait for first target to finish
      Now we wait from vm "to" to finish before launching "to2".  So we
      avoid similar problems in the future.

    Signed-off-by: Juan Quintela <quintela@redhat.com>


> Maybe it's still just hidden, but in that case I also agree enabling it in
> the repo is the simplest way to reproduce the failure again, if we still
> ever want to enable it one day..

We want.  If it still fails, we want to know why and fix it.

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]