qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 00/52] migration/rdma: Error handling fixes


From: Fabiano Rosas
Subject: Re: [PATCH 00/52] migration/rdma: Error handling fixes
Date: Tue, 19 Sep 2023 15:57:07 -0300

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote:
>> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
>> > Oh dear, where to start.  There's so much wrong, and in pretty obvious
>> > ways.  This code should never have passed review.  I'm refraining from
>> > saying more; see the commit messages instead.
>> > 
>> > Issues remaining after this series include:
>> > 
>> > * Terrible error messages
>> > 
>> > * Some error message cascades remain
>> > 
>> > * There is no written contract for QEMUFileHooks, and the
>> >   responsibility for reporting errors is unclear
>> 
>> Even being removed.. because no one is really extending that..
>> 
>> https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t
>
> One day (in another 5-10 years) I still hope we'll get to
> the point where QEMUFile itself is obsolete :-) Getting
> rid of QEMUFileHooks is a great step in that direction.
> Me finishing a old PoC to bring buffering to QIOChannel
> would be another big step.
>

If you need any help with that let me know. I've been tripping over
QEMUFile weirdness on a daily basis.

Just last week I was looking into restricting the usage of
qemu_file_set_error() to qemu-file.c so we can get rid of this situation
where any piece of code that has a pointer to the QEMUFile can put
whatever it wants in f->last_error* and the rest of the code has to
guess when to call qemu_file_get_error().

*last_error actually stores the first error

Moving all the interesting parts into the channel and removing QEMUFile
would of course be the better solution. Multifd already ignores it
completly, so there's probably more code that could be made generic
after that change.

Also, looking at what people do with iovs in the block layer, it seems
the migration code is a little behind.

> The data rate limiting would be the biggest missing piece
> to enable migration/vmstate logic to directly consume
> a QIOChannel.
>
> Eliminating QEMUFile would help to bring Error **errp
> to all the vmstate codepaths.
>
>> > * There seem to be no tests whatsoever
>> 
>> I always see rdma as "odd fixes" stage.. for a long time.  But maybe I was
>> wrong.
>
> In the MAINTAINERS file RDMA still get classified as formally
> supported under the migration maintainers.  I'm not convinced
> that is an accurate description of its status.  I tend to agree
> with you that it is 'odd fixes' at the very best.
>
> Dave Gilbert had previously speculated about whether we should
> even consider deprecating it on the basis that latest non-RDMA
> migration is too much better than in the past, with multifd
> and zerocopy, that RDMA might not even offer a significant
> enough peformance win to justify.
>
>> Copying Zhijian for status of rdma; Zhijian, I saw that you just replied to
>> the hwpoison issue.  Maybe we should have one entry for rdma too, just like
>> colo?
>
> With regards,
> Daniel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]