[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC]migration: stop/start device at the end of live migration concu
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [RFC]migration: stop/start device at the end of live migration concurrently |
Date: |
Mon, 1 Mar 2021 16:02:23 +0000 |
User-agent: |
Mutt/2.0.5 (2021-01-21) |
* Wangxin (Alexander) (wangxinxin.wang@huawei.com) wrote:
> Hi all,
(copying in Michael for vhost user maintainer).
> We found that the downtime of migration will reach a few seconds when live
> migrating a huge VM with 224vCPU/180GiB/16 vhost-user nics (x32 queues)/
> 24 vhost-user-blk disks(x4 queues), most of the time is spent in the
> position of stopping the device at src and starting device at dst.
I suspect that's more vhost-user devices than anyone else has run on a
single VM!
> Our idea is to stop the device through multiple threads during the end of
> migration. To be more specific, we create thread pool at the beginning of live
> migraion, when migration thread call virtio_vmstate_change callback to stop or
> start device in vm_state_notify, it will submits request to thread pool to
> handle the callback concurrently.
>
> We live migrate the vm and count the cost time at different stages of
> stopping/starting devices.
>
> - - - Cost: Original With state change
> concurrently
> get vring base 36ms 18ms
> disk disable guest notify 48ms 32ms
> disable host notify 300ms 120ms
> Src get vring base 1376ms 294ms
> net disable host notify 1011ms 116ms
> disable guest notify 59ms 40ms
> - - -
> enable guest notify 310ms 97ms
> net set memtable 48ms 20ms
> enable host notify 2022ms 114ms
> Dst enable host notify 312ms 78ms
> disk enable guest notify 32ms 23ms
> set memTable 16ms 10ms
> Total Downtime 5600ms 962ms
>
> However, there are some side effects:
> 1. When disable host notify or guest notify concurrently, the vm will be
> crashed
> due to disabling same notify at the different threads, we now add two
> different lock
> to solve this problem, it is hacking to do so and may be resulting in other
> problems.
>
> 2. As the QEMU BQL will be held by migration thread before stopping device in
> migration_completion, there will be deadlock in the following scene:
> migration_thread [thread 1]
> set_up_multithread
> ...
> migration_completion()# get QEMU BQL
> qemu_mutex_lock_iothread()
> vm_stop_force_state()
> ...
> submit stopping device request
> to thread pool
> virtio_vmstate_change
> virtio_set_status
> ...
> memory_region_transaction_begin
> ...
> prepare_mmio_access
>
> qemu_mutex_iothread_locked()# N
>
> qemu_mutex_lock_iothread()# deadlock
>
> Now we add another lock to replace the BQL in this scene to solve the problem,
> but we think this is not reliable enough and has potential risk that other
> processes will also use the QEMU BQL during the process of stopping device. My
> question is: how to deal with the conflict with QEMU BQL properly.
>
> Any advice will be appreciated, thanks.
To me it feels like the other way here would be to explicitly split
each of these stages into two; one where it sends the request to the
vhost device and the other it waits for the response from the vhost-user
device; (i.e. in the vhost_user case after the vhost_user_write but
before the vhost_user_read) - so instead of parallelising everything in
threads, you'd parallelise all of the corresponding operations;
so all of the get_vring_base's happen at the same time.
Michael: Would this make sense as a thing to change VhostOps
get_vring_base and many of the others into two part operations?
(or maybe coroutines with a yield in???)
Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK