qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] migration: Fix rdma migration failed


From: Zhijian Li (Fujitsu)
Subject: Re: [PATCH 1/2] migration: Fix rdma migration failed
Date: Fri, 22 Sep 2023 07:42:46 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0


On 20/09/2023 20:46, Fabiano Rosas wrote:
> Li Zhijian <lizhijian@fujitsu.com> writes:
> 
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>>
>> Destination will fail with:
>> qemu-system-x86_64: rdma: Too many requests in this message 
>> (3638950032).Bailing.
>>
>> migrate with RDMA is different from tcp. RDMA has its own control
>> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
>> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> Yeah, this is really fragile. We need a long term solution to this. Any
> other change to multifd protocol as well as any other change to the
> migration ram handling might hit this issue again.

Yeah, it's pain point.

Another option is that let RDMA control handler to know 
RAM_SAVE_FLAG_MULTIFD_FLUSH message
and do nothing with it.


> 
> Perhaps commit 294e5a4034 ("multifd: Only flush once each full round of
> memory") should simply not have touched the stream at that point, but we
> don't have any explicit safeguards to avoid interleaving flags from
> different layers like that (assuming multifd is at another logical layer
> than the ram handling)> 
> I don't have any good suggestions at this moment, so for now:
> 
> Reviewed-by: Fabiano Rosas <farosas@suse.de>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]