qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v5 5/7] migration/multifd: implement initialization of qpl co


From: Liu, Yuan1
Subject: RE: [PATCH v5 5/7] migration/multifd: implement initialization of qpl compression
Date: Wed, 20 Mar 2024 16:04:15 +0000

> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Wednesday, March 20, 2024 11:21 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: peterx@redhat.com; farosas@suse.de; qemu-devel@nongnu.org;
> hao.xiang@bytedance.com; bryan.zhang@bytedance.com; Zou, Nanhai
> <nanhai.zou@intel.com>
> Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of
> qpl compression
> 
> On Wed, Mar 20, 2024 at 03:02:59PM +0000, Liu, Yuan1 wrote:
> > > -----Original Message-----
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > Sent: Wednesday, March 20, 2024 6:42 PM
> > > To: Liu, Yuan1 <yuan1.liu@intel.com>
> > > Cc: peterx@redhat.com; farosas@suse.de; qemu-devel@nongnu.org;
> > > hao.xiang@bytedance.com; bryan.zhang@bytedance.com; Zou, Nanhai
> > > <nanhai.zou@intel.com>
> > > Subject: Re: [PATCH v5 5/7] migration/multifd: implement
> initialization of
> > > qpl compression
> > >
> > > On Wed, Mar 20, 2024 at 12:45:25AM +0800, Yuan Liu wrote:
> > > > the qpl initialization includes memory allocation for compressed
> > > > data and the qpl job initialization.
> > > >
> > > > the qpl initialization will check whether the In-Memory Analytics
> > > > Accelerator(IAA) hardware is available, if the platform does not
> > > > have IAA hardware or the IAA hardware is not available, the QPL
> > > > compression initialization will fail.
> > > >
> > > > Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> > > > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> > > > ---
> > > >  migration/multifd-qpl.c | 243
> +++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 242 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
> > > > index 056a68a060..6de65e9da7 100644
> > > > --- a/migration/multifd-qpl.c
> > > > +++ b/migration/multifd-qpl.c
> > > > @@ -9,12 +9,253 @@
> > > >   * This work is licensed under the terms of the GNU GPL, version 2
> or
> > > later.
> > > >   * See the COPYING file in the top-level directory.
> > > >   */
> > > > +
> > > >  #include "qemu/osdep.h"
> > > >  #include "qemu/module.h"
> > > > +#include "qapi/error.h"
> > > > +#include "migration.h"
> > > > +#include "multifd.h"
> > > > +#include "qpl/qpl.h"
> > > > +
> > > > +typedef struct {
> > > > +    qpl_job **job_array;
> > > > +    /* the number of allocated jobs */
> > > > +    uint32_t job_num;
> > > > +    /* the size of data processed by a qpl job */
> > > > +    uint32_t data_size;
> > > > +    /* compressed data buffer */
> > > > +    uint8_t *zbuf;
> > > > +    /* the length of compressed data */
> > > > +    uint32_t *zbuf_hdr;
> > > > +} QplData;
> > > > +
> > > > +static void free_zbuf(QplData *qpl)
> > > > +{
> > > > +    if (qpl->zbuf != NULL) {
> > > > +        munmap(qpl->zbuf, qpl->job_num * qpl->data_size);
> > > > +        qpl->zbuf = NULL;
> > > > +    }
> > > > +    if (qpl->zbuf_hdr != NULL) {
> > > > +        g_free(qpl->zbuf_hdr);
> > > > +        qpl->zbuf_hdr = NULL;
> > > > +    }
> > > > +}
> > > > +
> > > > +static int alloc_zbuf(QplData *qpl, uint8_t chan_id, Error **errp)
> > > > +{
> > > > +    int flags = MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS;
> > > > +    uint32_t size = qpl->job_num * qpl->data_size;
> > > > +    uint8_t *buf;
> > > > +
> > > > +    buf = (uint8_t *) mmap(NULL, size, PROT_READ | PROT_WRITE,
> flags, -
> > > 1, 0);
> > > > +    if (buf == MAP_FAILED) {
> > > > +        error_setg(errp, "multifd: %u: alloc_zbuf failed, job
> num %u,
> > > size %u",
> > > > +                   chan_id, qpl->job_num, qpl->data_size);
> > > > +        return -1;
> > > > +    }
> > >
> > > What's the reason for using mmap here, rather than a normal
> > > malloc ?
> >
> > I want to populate the memory accessed by the IAA device in the
> initialization
> > phase, and then avoid initiating I/O page faults through the IAA device
> during
> > migration, a large number of I/O page faults are not good for
> performance.
> 
> Does this mmap actually make a measurable difference ?
> 
> If I've followed the code paths correctly, I think this
> alloc_zbuf method only gets called during initial setup
> of each migration thread.
> 
> So this use of MAP_POPULATE seems to only make a difference
> between faulting in before starting sending data, and faulting
> in on first bit of data that's sent. I'm surprised if that's
> noticable as a difference.

You are right, the performance impact is only on the first page fault 
processing, and has little impact on the overall live migration performance.

I just did a simple test. The total time of live migration using g_malloc is
2321ms, and it takes 2098ms using mmap. I need more tests to check if 
g_malloc/g_malloc0 is feasible.

> > This problem also occurs at the destination, therefore, I recommend that
> > customers need to add -mem-prealloc for destination boot parameters.
> 
> I can understand mem-prelloc making a difference as that guarantees
> all of guest RAM is faulted in.
> 
> 
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|


reply via email to

[Prev in Thread] Current Thread [Next in Thread]