qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 00/21] Migration 20230530 patches


From: Juan Quintela
Subject: Re: [PULL 00/21] Migration 20230530 patches
Date: Thu, 01 Jun 2023 13:46:59 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Thu, Jun 01, 2023 at 09:27:09AM +0100, Daniel P. Berrangé wrote:
>> On Wed, May 31, 2023 at 11:03:23PM +0200, Juan Quintela wrote:
>> > Richard Henderson <richard.henderson@linaro.org> wrote:
>> > > On 5/30/23 11:25, Juan Quintela wrote:
>> > >> The following changes since commit 
>> > >> aa9bbd865502ed517624ab6fe7d4b5d89ca95e43:
>> > >>    Merge tag 'pull-ppc-20230528' of https://gitlab.com/danielhb/qemu
>> > >> into staging (2023-05-29 14:31:52 -0700)
>> > >> are available in the Git repository at:
>> > >>    https://gitlab.com/juan.quintela/qemu.git
>> > >> tags/migration-20230530-pull-request
>> > >> for you to fetch changes up to
>> > >> c63c544005e6b1375a9c038f0e0fb8dfb8b249f4:
>> > >>    migration/rdma: Check sooner if we are in postcopy for
>> > >> save_page() (2023-05-30 19:23:50 +0200)
>> > >> ----------------------------------------------------------------
>> > 
>> > Added Markus and Daniel.
>> > 
>> > >> Migration 20230530 Pull request (take 2)
>> > >> Hi
>> > >> Resend last PULL request, this time it compiles when CONFIG_RDMA is
>> > >> not configured in.
>> > >> [take 1]
>> > >> On this PULL request:
>> > >> - Set vmstate migration failure right (vladimir)
>> > >> - Migration QEMUFileHook removal (juan)
>> > >> - Migration Atomic counters (juan)
>> > >> Please apply.
>> > >> ----------------------------------------------------------------
>> > >> Juan Quintela (16):
>> > >>    migration: Don't abuse qemu_file transferred for RDMA
>> > >>    migration/RDMA: It is accounting for zero/normal pages in two places
>> > >>    migration/rdma: Remove QEMUFile parameter when not used
>> > >>    migration/rdma: Don't use imaginary transfers
>> > >>    migration: Remove unused qemu_file_credit_transfer()
>> > >>    migration/rdma: Simplify the function that saves a page
>> > >>    migration: Create migrate_rdma()
>> > >>    migration/rdma: Unfold ram_control_before_iterate()
>> > >>    migration/rdma: Unfold ram_control_after_iterate()
>> > >>    migration/rdma: Remove all uses of RAM_CONTROL_HOOK
>> > >>    migration/rdma: Unfold hook_ram_load()
>> > >>    migration/rdma: Create rdma_control_save_page()
>> > >>    qemu-file: Remove QEMUFileHooks
>> > >>    migration/rdma: Move rdma constants from qemu-file.h to rdma.h
>> > >>    migration/rdma: Remove qemu_ prefix from exported functions
>> > >>    migration/rdma: Check sooner if we are in postcopy for save_page()
>> > >> Vladimir Sementsov-Ogievskiy (5):
>> > >>    runstate: add runstate_get()
>> > >>    migration: never fail in global_state_store()
>> > >>    runstate: drop unused runstate_store()
>> > >>    migration: switch from .vm_was_running to .vm_old_state
>> > >>    migration: restore vmstate on migration failure
>> > >
>> > > Appears to introduce multiple avocado failures:
>> > >
>> > > https://gitlab.com/qemu-project/qemu/-/jobs/4378066518#L286
>> > >
>> > > Test summary:
>> > > tests/avocado/migration.py:X86_64.test_migration_with_exec: ERROR
>> > > tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: 
>> > > ERROR
>> > > tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR
>> > > make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142: 
>> > > check-avocado] Error 1
>> > >
>> > > https://gitlab.com/qemu-project/qemu/-/jobs/4378066523#L387
>> > >
>> > > Test summary:
>> > > tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: 
>> > > ERROR
>> > > tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR
>> > > make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142: 
>> > > check-avocado] Error 1
>> > >
>> > > Also fails QTEST_QEMU_BINARY=./qemu-system-aarch64 
>> > > ./tests/qtest/migration-test
>> > >
>> > > ../src/migration/rdma.c:408:QIO_CHANNEL_RDMA: Object 0xaaaaf7bba680 is
>> > > not an instance of type qio-channel-rdma
>> > 
>> > I am looking at the other errors, but this one is weird.  It is failing
>> > here:
>> > 
>> > #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
>> > OBJECT_DECLARE_SIMPLE_TYPE(QIOChannelRDMA, QIO_CHANNEL_RDMA)
>> > 
>> > In the OBJECT line.
>> > 
>> > I have no clue what problem are we having here with the object system to
>> > decide at declaration time that a variable is not of the type that we
>> > are declaring.
>> > 
>> > I am missing something obvious here?
>> 
>> I expect somewhere in the code has either corrupted memory, or is
>> using free'd memory. Either way you'll need to get a stack trace
>> to debug this kind of thing
>
> I've replied to the patches pointing out 4 places where the code
> casts to QIOChannelRDMA, without first checking that this is an
> RDMA migration, which look likely to be the cause of this.

Good catch.

I can only say: Ouch.

And why it don't failed for me.  It passes for me:
- make check (compiled every target/device/... that can be compiled on
  Fedora38)

- I tested hundreds of times migration-test during development, never
  failed like that

- I am switching to test aarch64 tcg as main target, because it appears
  it finds way more bugs on migration-tests.

Thanks again.

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]