qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] multifd: Copy pages before compressing them with zlib


From: Juan Quintela
Subject: Re: [PATCH] multifd: Copy pages before compressing them with zlib
Date: Mon, 04 Apr 2022 15:55:22 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Mon, Apr 04, 2022 at 12:20:14PM +0100, Dr. David Alan Gilbert wrote:
>> * Ilya Leoshkevich (iii@linux.ibm.com) wrote:
>> > zlib_send_prepare() compresses pages of a running VM. zlib does not
>> > make any thread-safety guarantees with respect to changing deflate()
>> > input concurrently with deflate() [1].
>> > 
>> > One can observe problems due to this with the IBM zEnterprise Data
>> > Compression accelerator capable zlib [2]. When the hardware
>> > acceleration is enabled, migration/multifd/tcp/zlib test fails
>> > intermittently [3] due to sliding window corruption.
>> > 
>> > At the moment this problem occurs only with this accelerator, since
>> > its architecture explicitly discourages concurrent accesses [4]:
>> > 
>> >     Page 26-57, "Other Conditions":
>> > 
>> >     As observed by this CPU, other CPUs, and channel
>> >     programs, references to the parameter block, first,
>> >     second, and third operands may be multiple-access
>> >     references, accesses to these storage locations are
>> >     not necessarily block-concurrent, and the sequence
>> >     of these accesses or references is undefined.
>> > 
>> > Still, it might affect other platforms due to a future zlib update.
>> > Therefore, copy the page being compressed into a private buffer before
>> > passing it to zlib.
>> 
>> While this might work around the problem; your explanation doesn't quite
>> fit with the symptoms; or if they do, then you have a separate problem.
>> 
>> The live migration code relies on the fact that the source is running
>> and changing it's memory as the data is transmitted; however it also
>> relies on the fact that if this happens the 'dirty' flag is set _after_
>> those changes causing another round of migration and retransmission of
>> the (now stable) data.
>> 
>> We don't expect the load of the data for the first page write to be
>> correct, consistent etc - we just rely on the retransmission to be
>> correct when the page is stable.
>> 
>> If your compressor hardware is doing something undefined during the
>> first case that's fine; as long as it works fine in the stable case
>> where the data isn't changing.
>> 
>> Adding the extra copy is going to slow everyone else dowmn; and since
>> there's plenty of pthread lockingin those multifd I'm expecting them
>> to get reasonably defined ordering and thus be safe from multi threading
>> problems (please correct us if we've actually done something wrong in
>> the locking there).
>> 
>> IMHO your accelerator when called from a zlib call needs to behave
>> the same as if it was the software implementation; i.e. if we've got
>> pthread calls in there that are enforcing ordering then that should be
>> fine; your accelerator implementation needs to add a barrier of some
>> type or an internal copy, not penalise everyone else.
>
> It is reasonable to argue that QEMU is relying on undefined behaviour
> when invoking zlib in this case, so it isn't clear that the accelerator
> impl should be changed, rather than QEMU be changed to follow the zlib
> API requirements. 

It works on all the other cases.  My vote if need taht is that we add a
zlib-sync or similar method.
zlib already means doing a copy, doing an extra copy will cost too much
on my opinion.

Once that we are here, is there such a requirement for zstd?  In my
testing, zstd was basically always better than zlib (no, I don't
remember the details).

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]