qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] multifd: Copy pages before compressing them with zlib


From: Dr. David Alan Gilbert
Subject: Re: [PATCH] multifd: Copy pages before compressing them with zlib
Date: Mon, 4 Apr 2022 12:20:14 +0100
User-agent: Mutt/2.1.5 (2021-12-30)

* Ilya Leoshkevich (iii@linux.ibm.com) wrote:
> zlib_send_prepare() compresses pages of a running VM. zlib does not
> make any thread-safety guarantees with respect to changing deflate()
> input concurrently with deflate() [1].
> 
> One can observe problems due to this with the IBM zEnterprise Data
> Compression accelerator capable zlib [2]. When the hardware
> acceleration is enabled, migration/multifd/tcp/zlib test fails
> intermittently [3] due to sliding window corruption.
> 
> At the moment this problem occurs only with this accelerator, since
> its architecture explicitly discourages concurrent accesses [4]:
> 
>     Page 26-57, "Other Conditions":
> 
>     As observed by this CPU, other CPUs, and channel
>     programs, references to the parameter block, first,
>     second, and third operands may be multiple-access
>     references, accesses to these storage locations are
>     not necessarily block-concurrent, and the sequence
>     of these accesses or references is undefined.
> 
> Still, it might affect other platforms due to a future zlib update.
> Therefore, copy the page being compressed into a private buffer before
> passing it to zlib.

While this might work around the problem; your explanation doesn't quite
fit with the symptoms; or if they do, then you have a separate problem.

The live migration code relies on the fact that the source is running
and changing it's memory as the data is transmitted; however it also
relies on the fact that if this happens the 'dirty' flag is set _after_
those changes causing another round of migration and retransmission of
the (now stable) data.

We don't expect the load of the data for the first page write to be
correct, consistent etc - we just rely on the retransmission to be
correct when the page is stable.

If your compressor hardware is doing something undefined during the
first case that's fine; as long as it works fine in the stable case
where the data isn't changing.

Adding the extra copy is going to slow everyone else dowmn; and since
there's plenty of pthread lockingin those multifd I'm expecting them
to get reasonably defined ordering and thus be safe from multi threading
problems (please correct us if we've actually done something wrong in
the locking there).

IMHO your accelerator when called from a zlib call needs to behave
the same as if it was the software implementation; i.e. if we've got
pthread calls in there that are enforcing ordering then that should be
fine; your accelerator implementation needs to add a barrier of some
type or an internal copy, not penalise everyone else.

Dave



> 
> [1] https://zlib.net/manual.html
> [2] https://github.com/madler/zlib/pull/410
> [3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
> [4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
> 
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
>  migration/multifd-zlib.c | 35 ++++++++++++++++++++++-------------
>  1 file changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index 3a7ae44485..b6b22b7d1f 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -27,6 +27,8 @@ struct zlib_data {
>      uint8_t *zbuff;
>      /* size of compressed buffer */
>      uint32_t zbuff_len;
> +    /* uncompressed buffer */
> +    uint8_t buf[];
>  };
>  
>  /* Multifd zlib compression */
> @@ -43,9 +45,18 @@ struct zlib_data {
>   */
>  static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
>  {
> -    struct zlib_data *z = g_new0(struct zlib_data, 1);
> -    z_stream *zs = &z->zs;
> +    /* This is the maximum size of the compressed buffer */
> +    uint32_t zbuff_len = compressBound(MULTIFD_PACKET_SIZE);
> +    size_t buf_len = qemu_target_page_size();
> +    struct zlib_data *z;
> +    z_stream *zs;
>  
> +    z = g_try_malloc0(sizeof(struct zlib_data) + buf_len + zbuff_len);
> +    if (!z) {
> +        error_setg(errp, "multifd %u: out of memory for zlib_data", p->id);
> +        return -1;
> +    }
> +    zs = &z->zs;
>      zs->zalloc = Z_NULL;
>      zs->zfree = Z_NULL;
>      zs->opaque = Z_NULL;
> @@ -54,15 +65,8 @@ static int zlib_send_setup(MultiFDSendParams *p, Error 
> **errp)
>          error_setg(errp, "multifd %u: deflate init failed", p->id);
>          return -1;
>      }
> -    /* This is the maxium size of the compressed buffer */
> -    z->zbuff_len = compressBound(MULTIFD_PACKET_SIZE);
> -    z->zbuff = g_try_malloc(z->zbuff_len);
> -    if (!z->zbuff) {
> -        deflateEnd(&z->zs);
> -        g_free(z);
> -        error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
> -        return -1;
> -    }
> +    z->zbuff_len = zbuff_len;
> +    z->zbuff = z->buf + buf_len;
>      p->data = z;
>      return 0;
>  }
> @@ -80,7 +84,6 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
> **errp)
>      struct zlib_data *z = p->data;
>  
>      deflateEnd(&z->zs);
> -    g_free(z->zbuff);
>      z->zbuff = NULL;
>      g_free(p->data);
>      p->data = NULL;
> @@ -114,8 +117,14 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
> **errp)
>              flush = Z_SYNC_FLUSH;
>          }
>  
> +        /*
> +         * Since the VM might be running, the page may be changing 
> concurrently
> +         * with compression. zlib does not guarantee that this is safe,
> +         * therefore copy the page before calling deflate().
> +         */
> +        memcpy(z->buf, p->pages->block->host + p->normal[i], page_size);
>          zs->avail_in = page_size;
> -        zs->next_in = p->pages->block->host + p->normal[i];
> +        zs->next_in = z->buf;
>  
>          zs->avail_out = available;
>          zs->next_out = z->zbuff + out_size;
> -- 
> 2.35.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]