qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] multifd: Copy pages before compressing them with zlib


From: Dr. David Alan Gilbert
Subject: Re: [PATCH] multifd: Copy pages before compressing them with zlib
Date: Tue, 5 Jul 2022 17:33:13 +0100
User-agent: Mutt/2.2.6 (2022-06-05)

* Christian Borntraeger (borntraeger@de.ibm.com) wrote:
> Am 05.07.22 um 18:16 schrieb Dr. David Alan Gilbert:
> > * Peter Maydell (peter.maydell@linaro.org) wrote:
> > > On Mon, 4 Jul 2022 at 17:43, Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> > > > 
> > > > zlib_send_prepare() compresses pages of a running VM. zlib does not
> > > > make any thread-safety guarantees with respect to changing deflate()
> > > > input concurrently with deflate() [1].
> > > > 
> > > > One can observe problems due to this with the IBM zEnterprise Data
> > > > Compression accelerator capable zlib [2]. When the hardware
> > > > acceleration is enabled, migration/multifd/tcp/plain/zlib test fails
> > > > intermittently [3] due to sliding window corruption. The accelerator's
> > > > architecture explicitly discourages concurrent accesses [4]:
> > > > 
> > > >      Page 26-57, "Other Conditions":
> > > > 
> > > >      As observed by this CPU, other CPUs, and channel
> > > >      programs, references to the parameter block, first,
> > > >      second, and third operands may be multiple-access
> > > >      references, accesses to these storage locations are
> > > >      not necessarily block-concurrent, and the sequence
> > > >      of these accesses or references is undefined.
> > > > 
> > > > Mark Adler pointed out that vanilla zlib performs double fetches under
> > > > certain circumstances as well [5], therefore we need to copy data
> > > > before passing it to deflate().
> > > > 
> > > > [1] https://zlib.net/manual.html
> > > > [2] https://github.com/madler/zlib/pull/410
> > > > [3] 
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
> > > > [4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
> > > > [5] https://gitlab.com/qemu-project/qemu/-/issues/1099
> > > 
> > > Is this [5] the wrong link? It's to our issue tracker, not zlib's
> > > or a zlib mailing list thread, and it doesn't contain any messages
> > > from Mark Adler.
> > 
> > Looking at Mark's message, I'm not seeing that it was cc'd to the lists.
> > I did however ask him to update zlib's docs to describe the requirement.
> 
> 
> Can you maybe forward the message here?

Lets see, it looks OK from the content, here's a copy of my reply with
the thread in it.  I've add Mark to the cc here so he knows.

Dave

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
* Mark Adler (zlib@madler.net) wrote:
> Dave,
> 
> d), which should also result in an invalid check value (CRC-32 or Adler-32). 
> I suppose you could call that b).
> 
> To get c), the input data would need to be read exactly once. However there 
> is a case in deflate when writing a stored block where the input is accessed 
> twice — once to copy to the output, and then a second time to fill in the 
> sliding window. If the data changes between those two, then the sliding 
> window does not reflect the data written, which can propagate to incorrect 
> matches downstream of the modified data.
> 
> That is the only place I see that. The impact would usually be c), but if you 
> are trying to compress incompressible data followed by compressible data, you 
> will have stored blocks followed by dynamic blocks with matches to the 
> incorrect data. Your testing would likely not expose that. (I tried to 
> compile the linked test, but went down a rat hole to find include files and 
> gave up.)

OK - thanks for your clarification!
I've created:
  https://gitlab.com/qemu-project/qemu/-/issues/1099

as a reminder we need to fix this in qemu somewhere.

Could you please add a note to the zlib docs somewhere to make this
explicit.

Thanks again,

Dave

> Mark
> 
> 
> > On Jun 30, 2022, at 9:26 AM, Dr. David Alan Gilbert <dgilbert@redhat.com> 
> > wrote:
> > 
> > * Mark Adler (zlib@madler.net <mailto:zlib@madler.net>) wrote:
> >> Ilya,
> >> 
> >> What exactly do you mean by “concurrently”? What is an example of this?
> > 
> > In qemu's live migration we have a thread that shuffles the contents of
> > guest memory out over the network. The guest is still
> > running at the time and changing the contents of the memory we're
> > saving.
> > Fortunately we keep a 'modified' flag so that if the guest does modify
> > it while we're saving, we know about it and will send the block again.
> > 
> > Zlib (and zstd) have recently been forcibly inserted into this; so zlib
> > may be compressing a page of memory that changes.
> > 
> >> If you mean modifying the data provided to deflate() before deflate() has 
> >> returned, then that is certainly not safe.
> > 
> > So a question is what does 'not safe' mean:
> > a) It explodes and segs
> > b) It produces an invalid stream
> > c) It produces a valid stream but the data for the modified block may
> > be garbage
> > d) It produces a valid stream but the data for the modified block and
> > other blocks may be garbage.
> > 
> > The qemu live migration code is happy with (c) because it'll retransmit
> > a stable block later. So far with the software zlib libraries we've
> > seen that be fine; I think Ilya is finding something like (b) or (d) on
> > their compression hardware.
> > 
> > Dave
> > 
> >> 
> >> Mark
> >> 
> >> 
> >>> On Jun 22, 2022, at 2:04 AM, Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>> 
> >>> [resending with a smaller cc: list in order to pass the
> >>> zlib-devel mailing list moderation process]
> >>> 
> >>> Hello zlib developers,
> >>> 
> >>> I've been investigating a problem in the QEMU test suite on IBM Z [1]
> >>> [2] in connection with the IBM Z compression accelerator patch [3].
> >>> 
> >>> The problem is that a QEMU thread compresses data that is being
> >>> modified by another QEMU thread. zlib manual [4] does not state that
> >>> this is safe, however, the current stable zlib in fact tolerates it.
> >>> 
> >>> The accelerator, however, does not: not only what it compresses ends up
> >>> being unpredictable - QEMU actually resolves this just fine -
> >>> but the accelerator's internal state also ends up being corrupted.
> >>> 
> >>> I have a design question in connection to this: does zlib guarantee
> >>> that modifying deflate() input concurrently with deflate() is safe?
> >>> Or does it reserve the right to change this in the future versions?
> >>> 
> >>> Cc:ing zlib-ng folks for their opinion as well.
> >>> 
> >>> [1] https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg06841.html
> >>> [2] https://lists.gnu.org/archive/html/qemu-devel/2022-04/msg00329.html
> >>> [3] https://github.com/madler/zlib/pull/410
> >>> [4] https://zlib.net/manual.html
> >>> 
> >>> Best regards,
> >>> Ilya
> >> 
> > -- 
> > Dr. David Alan Gilbert / dgilbert@redhat.com <mailto:dgilbert@redhat.com> / 
> > Manchester, UK
> 
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]