[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released b
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver. |
Date: |
Tue, 29 Mar 2016 15:28:53 +0300 |
On Mon, Mar 28, 2016 at 09:46:05AM +0530, Jitendra Kolhe wrote:
> While measuring live migration performance for qemu/kvm guest, it
> was observed that the qemu doesn’t maintain any intelligence for the
> guest ram pages which are released by the guest balloon driver and
> treat such pages as any other normal guest ram pages. This has direct
> impact on overall migration time for the guest which has released
> (ballooned out) memory to the host.
>
> In case of large systems, where we can configure large guests with 1TB
> and with considerable amount of memory release by balloon driver to the,
> host the migration time gets worse.
>
> The solution proposed below is local only to qemu (and does not require
> any modification to Linux kernel or any guest driver). We have verified
> the fix for large guests =1TB on HPE Superdome X (which can support up
> to 240 cores and 12TB of memory) and in case where 90% of memory is
> released by balloon driver the migration time for an idle guests reduces
> to ~600 sec's from ~1200 sec’s.
>
> Detail: During live migration, as part of 1st iteration in ram_save_iterate()
> -> ram_find_and_save_block () will try to migrate ram pages which are
> released by vitrio-balloon driver as part of dynamic memory delete.
> Even though the pages which are returned to the host by virtio-balloon
> driver are zero pages, the migration algorithm will still end up
> scanning the entire page ram_find_and_save_block() -> ram_save_page/
> ram_save_compressed_page -> save_zero_page() -> is_zero_range(). We
> also end-up sending some control information over network for these
> page during migration. This adds to total migration time.
>
> The proposed fix, uses the existing bitmap infrastructure to create
> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram
> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents
> entire guest ram memory till max configured memory. Guest ram pages
> claimed by the virtio-balloon driver will be represented by 1 in the
> bitmap. During live migration, each guest ram page (host VA offset)
> is checked against the virtio-balloon bitmap, if the bit is set the
> corresponding ram page will be excluded from scanning and sending
> control information during migration. The bitmap is also migrated to
> the target as part of every ram_save_iterate loop and after the
> guest is stopped remaining balloon bitmap is migrated as part of
> balloon driver save / load interface.
Migrating the bitmap might help a chained migration case
but will slow down the more typical case a bit.
Make it optional?
>
> With the proposed fix, the average migration time for an idle guest
> with 1TB maximum memory and 64vCpus
> - reduces from ~1200 secs to ~600 sec, with guest memory ballooned
> down to 128GB (~10% of 1TB).
> - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned
> down to 896GB (~90% of 1TB),
> - with no ballooning configured, we don’t expect to see any impact
> on total migration time.
>
> The optimization gets temporarily disabled, if the balloon operation is
> in progress. Since the optimization skips scanning and migrating control
> information for ballooned out pages, we might skip guest ram pages in
> cases where the guest balloon driver has freed the ram page to the guest
> but not yet informed the host/qemu about the ram page
> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we
> might skip migrating ram pages which the guest is using. Since this
> problem is specific to balloon leak, we can restrict balloon operation in
> progress check to only balloon leak operation in progress check.
>
> The optimization also get permanently disabled (for all subsequent
> migrations) in case any of the migration uses postcopy capability. In case
> of postcopy the balloon bitmap would be required to send after vm_stop,
> which has significant impact on the downtime. Moreover, the applications
> in the guest space won’t be actually faulting on the ram pages which are
> already ballooned out, the proposed optimization will not show any
> improvement in migration time during postcopy.
I think this optimization can work for post copy with some
modifications.
For post-copy, when guest takes page out of the balloon,
it notifies the host.
This only happens when MUST_TELL_HOST feature has been
negotiated but most guests do negotiate it nowdays.
(It also seems that your code assumes MUST_TELL_HOST - I think
you must check and disable the bitmap management otherwise).
At that point we can mark page as migrated to avoid
requesting it from source.
> Signed-off-by: Jitendra Kolhe <address@hidden>
> ---
> Changed in v2:
> - Resolved compilation issue for qemu-user binaries in exec.c
> - Localize balloon bitmap test to save_zero_page().
> - Updated version string for newly added migration capability to 2.7.
> - Made minor modifications to patch commit text.
>
> balloon.c | 253
> ++++++++++++++++++++++++++++++++++++-
> exec.c | 3 +
> hw/virtio/virtio-balloon.c | 35 ++++-
> include/hw/virtio/virtio-balloon.h | 1 +
> include/migration/migration.h | 1 +
> include/sysemu/balloon.h | 15 ++-
> migration/migration.c | 9 ++
> migration/ram.c | 31 ++++-
> qapi-schema.json | 5 +-
> 9 files changed, 341 insertions(+), 12 deletions(-)
>
> diff --git a/balloon.c b/balloon.c
> index f2ef50c..1c2d228 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -33,11 +33,34 @@
> #include "qmp-commands.h"
> #include "qapi/qmp/qerror.h"
> #include "qapi/qmp/qjson.h"
> +#include "exec/ram_addr.h"
> +#include "migration/migration.h"
> +
> +#define BALLOON_BITMAP_DISABLE_FLAG -1UL
> +
> +typedef enum {
> + BALLOON_BITMAP_DISABLE_NONE = 1, /* Enabled */
> + BALLOON_BITMAP_DISABLE_CURRENT,
> + BALLOON_BITMAP_DISABLE_PERMANENT,
> +} BalloonBitmapDisableState;
>
> static QEMUBalloonEvent *balloon_event_fn;
> static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonInProgress *balloon_in_progress_fn;
> static void *balloon_opaque;
> static bool balloon_inhibited;
> +static unsigned long balloon_bitmap_pages;
> +static unsigned int balloon_bitmap_pfn_shift;
> +static QemuMutex balloon_bitmap_mutex;
> +static bool balloon_bitmap_xfered;
> +static unsigned long balloon_min_bitmap_offset;
> +static unsigned long balloon_max_bitmap_offset;
> +static BalloonBitmapDisableState balloon_bitmap_disable_state;
> +
> +static struct BitmapRcu {
> + struct rcu_head rcu;
> + unsigned long *bmap;
> +} *balloon_bitmap_rcu;
>
> bool qemu_balloon_is_inhibited(void)
> {
> @@ -49,6 +72,21 @@ void qemu_balloon_inhibit(bool state)
> balloon_inhibited = state;
> }
>
> +void qemu_mutex_lock_balloon_bitmap(void)
> +{
> + qemu_mutex_lock(&balloon_bitmap_mutex);
> +}
> +
> +void qemu_mutex_unlock_balloon_bitmap(void)
> +{
> + qemu_mutex_unlock(&balloon_bitmap_mutex);
> +}
> +
> +void qemu_balloon_reset_bitmap_data(void)
> +{
> + balloon_bitmap_xfered = false;
> +}
> +
> static bool have_balloon(Error **errp)
> {
> if (kvm_enabled() && !kvm_has_sync_mmu()) {
> @@ -65,9 +103,12 @@ static bool have_balloon(Error **errp)
> }
>
> int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> - QEMUBalloonStatus *stat_func, void *opaque)
> + QEMUBalloonStatus *stat_func,
> + QEMUBalloonInProgress *in_progress_func,
> + void *opaque, int pfn_shift)
> {
> - if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> + if (balloon_event_fn || balloon_stat_fn ||
> + balloon_in_progress_fn || balloon_opaque) {
> /* We're already registered one balloon handler. How many can
> * a guest really have?
> */
> @@ -75,17 +116,39 @@ int qemu_add_balloon_handler(QEMUBalloonEvent
> *event_func,
> }
> balloon_event_fn = event_func;
> balloon_stat_fn = stat_func;
> + balloon_in_progress_fn = in_progress_func;
> balloon_opaque = opaque;
> +
> + qemu_mutex_init(&balloon_bitmap_mutex);
> + balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_NONE;
> + balloon_bitmap_pfn_shift = pfn_shift;
> + balloon_bitmap_pages = (last_ram_offset() >> balloon_bitmap_pfn_shift);
> + balloon_bitmap_rcu = g_new0(struct BitmapRcu, 1);
> + balloon_bitmap_rcu->bmap = bitmap_new(balloon_bitmap_pages);
> + bitmap_clear(balloon_bitmap_rcu->bmap, 0, balloon_bitmap_pages);
> +
> return 0;
> }
>
> +static void balloon_bitmap_free(struct BitmapRcu *bmap)
> +{
> + g_free(bmap->bmap);
> + g_free(bmap);
> +}
> +
> void qemu_remove_balloon_handler(void *opaque)
> {
> + struct BitmapRcu *bitmap = balloon_bitmap_rcu;
> if (balloon_opaque != opaque) {
> return;
> }
> + atomic_rcu_set(&balloon_bitmap_rcu, NULL);
> + if (bitmap) {
> + call_rcu(bitmap, balloon_bitmap_free, rcu);
> + }
> balloon_event_fn = NULL;
> balloon_stat_fn = NULL;
> + balloon_in_progress_fn = NULL;
> balloon_opaque = NULL;
> }
>
> @@ -116,3 +179,189 @@ void qmp_balloon(int64_t target, Error **errp)
> trace_balloon_event(balloon_opaque, target);
> balloon_event_fn(balloon_opaque, target);
> }
> +
> +/* Handle Ram hotplug case, only called in case old < new */
> +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new)
> +{
> + struct BitmapRcu *old_bitmap = balloon_bitmap_rcu, *bitmap;
> + unsigned long old_offset, new_offset;
> +
> + if (!balloon_bitmap_rcu) {
> + return -1;
> + }
> +
> + old_offset = (old >> balloon_bitmap_pfn_shift);
> + new_offset = (new >> balloon_bitmap_pfn_shift);
> +
> + bitmap = g_new(struct BitmapRcu, 1);
> + bitmap->bmap = bitmap_new(new_offset);
> +
> + qemu_mutex_lock_balloon_bitmap();
> + bitmap_clear(bitmap->bmap, 0,
> + balloon_bitmap_pages + new_offset - old_offset);
> + bitmap_copy(bitmap->bmap, old_bitmap->bmap, old_offset);
> +
> + atomic_rcu_set(&balloon_bitmap_rcu, bitmap);
> + balloon_bitmap_pages += new_offset - old_offset;
> + qemu_mutex_unlock_balloon_bitmap();
> + call_rcu(old_bitmap, balloon_bitmap_free, rcu);
> +
> + return 0;
> +}
> +
> +/* Should be called with balloon bitmap mutex lock held */
> +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate)
> +{
> + unsigned long *bitmap;
> + unsigned long offset = 0;
> +
> + if (!balloon_bitmap_rcu) {
> + return -1;
> + }
> + offset = (addr >> balloon_bitmap_pfn_shift);
> + if (balloon_bitmap_xfered) {
> + if (offset < balloon_min_bitmap_offset) {
> + balloon_min_bitmap_offset = offset;
> + }
> + if (offset > balloon_max_bitmap_offset) {
> + balloon_max_bitmap_offset = offset;
> + }
> + }
> +
> + rcu_read_lock();
> + bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap;
> + if (deflate == 0) {
> + set_bit(offset, bitmap);
> + } else {
> + clear_bit(offset, bitmap);
> + }
> + rcu_read_unlock();
> + return 0;
> +}
> +
> +void qemu_balloon_bitmap_setup(void)
> +{
> + if (migrate_postcopy_ram()) {
> + balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_PERMANENT;
> + } else if ((!balloon_bitmap_rcu || !migrate_skip_balloon()) &&
> + (balloon_bitmap_disable_state !=
> + BALLOON_BITMAP_DISABLE_PERMANENT)) {
> + balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_CURRENT;
> + }
> +}
> +
> +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr)
> +{
> + unsigned long *bitmap;
> + ram_addr_t base;
> + unsigned long nr = 0;
> + int ret = 0;
> +
> + if (balloon_bitmap_disable_state == BALLOON_BITMAP_DISABLE_CURRENT ||
> + balloon_bitmap_disable_state == BALLOON_BITMAP_DISABLE_PERMANENT) {
> + return 0;
> + }
> + balloon_in_progress_fn(balloon_opaque, &ret);
> + if (ret == 1) {
> + return 0;
> + }
> +
> + rcu_read_lock();
> + bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap;
> + base = rb->offset >> balloon_bitmap_pfn_shift;
> + nr = base + (addr >> balloon_bitmap_pfn_shift);
> + if (test_bit(nr, bitmap)) {
> + ret = 1;
> + }
> + rcu_read_unlock();
> + return ret;
> +}
> +
> +int qemu_balloon_bitmap_save(QEMUFile *f)
> +{
> + unsigned long *bitmap;
> + unsigned long offset = 0, next = 0, len = 0;
> + unsigned long tmpoffset = 0, tmplimit = 0;
> +
> + if (balloon_bitmap_disable_state == BALLOON_BITMAP_DISABLE_PERMANENT) {
> + qemu_put_be64(f, BALLOON_BITMAP_DISABLE_FLAG);
> + return 0;
> + }
> +
> + qemu_mutex_lock_balloon_bitmap();
> + if (balloon_bitmap_xfered) {
> + tmpoffset = balloon_min_bitmap_offset;
> + tmplimit = balloon_max_bitmap_offset;
> + } else {
> + balloon_bitmap_xfered = true;
> + tmpoffset = offset;
> + tmplimit = balloon_bitmap_pages;
> + }
> +
> + balloon_min_bitmap_offset = balloon_bitmap_pages;
> + balloon_max_bitmap_offset = 0;
> +
> + qemu_put_be64(f, balloon_bitmap_pages);
> + qemu_put_be64(f, tmpoffset);
> + qemu_put_be64(f, tmplimit);
> + rcu_read_lock();
> + bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap;
> + while (tmpoffset < tmplimit) {
> + unsigned long next_set_bit, start_set_bit;
> + next_set_bit = find_next_bit(bitmap, balloon_bitmap_pages,
> tmpoffset);
> + start_set_bit = next_set_bit;
> + if (next_set_bit == balloon_bitmap_pages) {
> + len = 0;
> + next = start_set_bit;
> + qemu_put_be64(f, next);
> + qemu_put_be64(f, len);
> + break;
> + }
> + next_set_bit = find_next_zero_bit(bitmap,
> + balloon_bitmap_pages,
> + ++next_set_bit);
> + len = (next_set_bit - start_set_bit);
> + next = start_set_bit;
> + qemu_put_be64(f, next);
> + qemu_put_be64(f, len);
> + tmpoffset = next + len;
> + }
> + rcu_read_unlock();
> + qemu_mutex_unlock_balloon_bitmap();
> + return 0;
> +}
> +
> +int qemu_balloon_bitmap_load(QEMUFile *f)
> +{
> + unsigned long *bitmap;
> + unsigned long next = 0, len = 0;
> + unsigned long tmpoffset = 0, tmplimit = 0;
> +
> + if (!balloon_bitmap_rcu) {
> + return -1;
> + }
> +
> + qemu_mutex_lock_balloon_bitmap();
> + balloon_bitmap_pages = qemu_get_be64(f);
> + if (balloon_bitmap_pages == BALLOON_BITMAP_DISABLE_FLAG) {
> + balloon_bitmap_disable_state = BALLOON_BITMAP_DISABLE_PERMANENT;
> + qemu_mutex_unlock_balloon_bitmap();
> + return 0;
> + }
> + tmpoffset = qemu_get_be64(f);
> + tmplimit = qemu_get_be64(f);
> + rcu_read_lock();
> + bitmap = atomic_rcu_read(&balloon_bitmap_rcu)->bmap;
> + while (tmpoffset < tmplimit) {
> + next = qemu_get_be64(f);
> + len = qemu_get_be64(f);
> + if (len == 0) {
> + break;
> + }
> + bitmap_set(bitmap, next, len);
> + tmpoffset = next + len;
> + }
> + rcu_read_unlock();
> + qemu_mutex_unlock_balloon_bitmap();
> + return 0;
> +}
> diff --git a/exec.c b/exec.c
> index f398d21..7a448e5 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -43,6 +43,7 @@
> #else /* !CONFIG_USER_ONLY */
> #include "sysemu/xen-mapcache.h"
> #include "trace.h"
> +#include "sysemu/balloon.h"
> #endif
> #include "exec/cpu-all.h"
> #include "qemu/rcu_queue.h"
> @@ -1610,6 +1611,8 @@ static void ram_block_add(RAMBlock *new_block, Error
> **errp)
> if (new_ram_size > old_ram_size) {
> migration_bitmap_extend(old_ram_size, new_ram_size);
> dirty_memory_extend(old_ram_size, new_ram_size);
> + qemu_balloon_bitmap_extend(old_ram_size << TARGET_PAGE_BITS,
> + new_ram_size << TARGET_PAGE_BITS);
> }
> /* Keep the list sorted from biggest to smallest block. Unlike QTAILQ,
> * QLIST (which has an RCU-friendly variant) does not have insertion at
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 22ad25c..9f3a4c8 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -27,6 +27,7 @@
> #include "qapi/visitor.h"
> #include "qapi-event.h"
> #include "trace.h"
> +#include "migration/migration.h"
>
> #if defined(__linux__)
> #include <sys/mman.h>
> @@ -214,11 +215,13 @@ static void virtio_balloon_handle_output(VirtIODevice
> *vdev, VirtQueue *vq)
> VirtQueueElement *elem;
> MemoryRegionSection section;
>
> + qemu_mutex_lock_balloon_bitmap();
> for (;;) {
> size_t offset = 0;
> uint32_t pfn;
> elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> if (!elem) {
> + qemu_mutex_unlock_balloon_bitmap();
> return;
> }
>
> @@ -242,6 +245,7 @@ static void virtio_balloon_handle_output(VirtIODevice
> *vdev, VirtQueue *vq)
> addr = section.offset_within_region;
> balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
> !!(vq == s->dvq));
> + qemu_balloon_bitmap_update(addr, !!(vq == s->dvq));
> memory_region_unref(section.mr);
> }
>
> @@ -249,6 +253,7 @@ static void virtio_balloon_handle_output(VirtIODevice
> *vdev, VirtQueue *vq)
> virtio_notify(vdev, vq);
> g_free(elem);
> }
> + qemu_mutex_unlock_balloon_bitmap();
> }
>
> static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue *vq)
> @@ -303,6 +308,16 @@ out:
> }
> }
>
> +static void virtio_balloon_migration_state_changed(Notifier *notifier,
> + void *data)
> +{
> + MigrationState *mig = data;
> +
> + if (migration_has_failed(mig)) {
> + qemu_balloon_reset_bitmap_data();
> + }
> +}
> +
> static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t
> *config_data)
> {
> VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -382,6 +397,16 @@ static void virtio_balloon_stat(void *opaque,
> BalloonInfo *info)
> VIRTIO_BALLOON_PFN_SHIFT);
> }
>
> +static void virtio_balloon_in_progress(void *opaque, int *status)
> +{
> + VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
> + if (cpu_to_le32(dev->actual) != cpu_to_le32(dev->num_pages)) {
> + *status = 1;
> + return;
> + }
> + *status = 0;
> +}
> +
> static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
> {
> VirtIOBalloon *dev = VIRTIO_BALLOON(opaque);
> @@ -409,6 +434,7 @@ static void virtio_balloon_save_device(VirtIODevice
> *vdev, QEMUFile *f)
>
> qemu_put_be32(f, s->num_pages);
> qemu_put_be32(f, s->actual);
> + qemu_balloon_bitmap_save(f);
> }
>
> static int virtio_balloon_load(QEMUFile *f, void *opaque, int version_id)
> @@ -426,6 +452,7 @@ static int virtio_balloon_load_device(VirtIODevice *vdev,
> QEMUFile *f,
>
> s->num_pages = qemu_get_be32(f);
> s->actual = qemu_get_be32(f);
> + qemu_balloon_bitmap_load(f);
> return 0;
> }
>
> @@ -439,7 +466,9 @@ static void virtio_balloon_device_realize(DeviceState
> *dev, Error **errp)
> sizeof(struct virtio_balloon_config));
>
> ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> - virtio_balloon_stat, s);
> + virtio_balloon_stat,
> + virtio_balloon_in_progress, s,
> + VIRTIO_BALLOON_PFN_SHIFT);
>
> if (ret < 0) {
> error_setg(errp, "Only one balloon device is supported");
> @@ -453,6 +482,9 @@ static void virtio_balloon_device_realize(DeviceState
> *dev, Error **errp)
>
> reset_stats(s);
>
> + s->migration_state_notifier.notify =
> virtio_balloon_migration_state_changed;
> + add_migration_state_change_notifier(&s->migration_state_notifier);
> +
> register_savevm(dev, "virtio-balloon", -1, 1,
> virtio_balloon_save, virtio_balloon_load, s);
> }
> @@ -462,6 +494,7 @@ static void virtio_balloon_device_unrealize(DeviceState
> *dev, Error **errp)
> VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> VirtIOBalloon *s = VIRTIO_BALLOON(dev);
>
> + remove_migration_state_change_notifier(&s->migration_state_notifier);
> balloon_stats_destroy_timer(s);
> qemu_remove_balloon_handler(s);
> unregister_savevm(dev, "virtio-balloon", s);
> diff --git a/include/hw/virtio/virtio-balloon.h
> b/include/hw/virtio/virtio-balloon.h
> index 35f62ac..1ded5a9 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -43,6 +43,7 @@ typedef struct VirtIOBalloon {
> int64_t stats_last_update;
> int64_t stats_poll_interval;
> uint32_t host_features;
> + Notifier migration_state_notifier;
> } VirtIOBalloon;
>
> #endif
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index ac2c12c..6c1d1af 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -267,6 +267,7 @@ void migrate_del_blocker(Error *reason);
>
> bool migrate_postcopy_ram(void);
> bool migrate_zero_blocks(void);
> +bool migrate_skip_balloon(void);
>
> bool migrate_auto_converge(void);
>
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index 3f976b4..5325c38 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -15,14 +15,27 @@
> #define _QEMU_BALLOON_H
>
> #include "qapi-types.h"
> +#include "migration/qemu-file.h"
>
> typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
> typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef void (QEMUBalloonInProgress) (void *opaque, int *status);
>
> int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> - QEMUBalloonStatus *stat_func, void *opaque);
> + QEMUBalloonStatus *stat_func,
> + QEMUBalloonInProgress *progress_func,
> + void *opaque, int pfn_shift);
> void qemu_remove_balloon_handler(void *opaque);
> bool qemu_balloon_is_inhibited(void);
> void qemu_balloon_inhibit(bool state);
> +void qemu_mutex_lock_balloon_bitmap(void);
> +void qemu_mutex_unlock_balloon_bitmap(void);
> +void qemu_balloon_reset_bitmap_data(void);
> +void qemu_balloon_bitmap_setup(void);
> +int qemu_balloon_bitmap_extend(ram_addr_t old, ram_addr_t new);
> +int qemu_balloon_bitmap_update(ram_addr_t addr, int deflate);
> +int qemu_balloon_bitmap_test(RAMBlock *rb, ram_addr_t addr);
> +int qemu_balloon_bitmap_save(QEMUFile *f);
> +int qemu_balloon_bitmap_load(QEMUFile *f);
>
> #endif
> diff --git a/migration/migration.c b/migration/migration.c
> index 034a918..cb86307 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1200,6 +1200,15 @@ int migrate_use_xbzrle(void)
> return s->enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE];
> }
>
> +bool migrate_skip_balloon(void)
> +{
> + MigrationState *s;
> +
> + s = migrate_get_current();
> +
> + return s->enabled_capabilities[MIGRATION_CAPABILITY_SKIP_BALLOON];
> +}
> +
> int64_t migrate_xbzrle_cache_size(void)
> {
> MigrationState *s;
> diff --git a/migration/ram.c b/migration/ram.c
> index 704f6a9..161ab73 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -40,6 +40,7 @@
> #include "trace.h"
> #include "exec/ram_addr.h"
> #include "qemu/rcu_queue.h"
> +#include "sysemu/balloon.h"
>
> #ifdef DEBUG_MIGRATION_RAM
> #define DPRINTF(fmt, ...) \
> @@ -65,6 +66,7 @@ static uint64_t bitmap_sync_count;
> #define RAM_SAVE_FLAG_XBZRLE 0x40
> /* 0x80 is reserved in migration.h start with 0x100 next */
> #define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100
> +#define RAM_SAVE_FLAG_BALLOON 0x200
>
> static const uint8_t ZERO_TARGET_PAGE[TARGET_PAGE_SIZE];
>
> @@ -702,13 +704,17 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block,
> ram_addr_t offset,
> {
> int pages = -1;
>
> - if (is_zero_range(p, TARGET_PAGE_SIZE)) {
> - acct_info.dup_pages++;
> - *bytes_transferred += save_page_header(f, block,
> + if (qemu_balloon_bitmap_test(block, offset) != 1) {
> + if (is_zero_range(p, TARGET_PAGE_SIZE)) {
> + acct_info.dup_pages++;
> + *bytes_transferred += save_page_header(f, block,
> offset |
> RAM_SAVE_FLAG_COMPRESS);
> - qemu_put_byte(f, 0);
> - *bytes_transferred += 1;
> - pages = 1;
> + qemu_put_byte(f, 0);
> + *bytes_transferred += 1;
> + pages = 1;
> + }
> + } else {
> + pages = 0;
> }
>
> return pages;
> @@ -773,7 +779,7 @@ static int ram_save_page(QEMUFile *f, PageSearchStatus
> *pss,
> * page would be stale
> */
> xbzrle_cache_zero_page(current_addr);
> - } else if (!ram_bulk_stage && migrate_use_xbzrle()) {
> + } else if (pages != 0 && !ram_bulk_stage && migrate_use_xbzrle()) {
> pages = save_xbzrle_page(f, &p, current_addr, block,
> offset, last_stage, bytes_transferred);
> if (!last_stage) {
> @@ -1355,6 +1361,9 @@ static int ram_find_and_save_block(QEMUFile *f, bool
> last_stage,
> }
>
> if (found) {
> + /* skip saving ram host page if the corresponding guest page
> + * is ballooned out
> + */
> pages = ram_save_host_page(ms, f, &pss,
> last_stage, bytes_transferred,
> dirty_ram_abs);
> @@ -1959,6 +1968,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>
> rcu_read_unlock();
>
> + qemu_balloon_bitmap_setup();
> ram_control_before_iterate(f, RAM_CONTROL_SETUP);
> ram_control_after_iterate(f, RAM_CONTROL_SETUP);
>
> @@ -1984,6 +1994,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>
> ram_control_before_iterate(f, RAM_CONTROL_ROUND);
>
> + qemu_put_be64(f, RAM_SAVE_FLAG_BALLOON);
> + qemu_balloon_bitmap_save(f);
> +
> t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> i = 0;
> while ((ret = qemu_file_rate_limit(f)) == 0) {
> @@ -2493,6 +2506,10 @@ static int ram_load(QEMUFile *f, void *opaque, int
> version_id)
> }
> break;
>
> + case RAM_SAVE_FLAG_BALLOON:
> + qemu_balloon_bitmap_load(f);
> + break;
> +
> case RAM_SAVE_FLAG_COMPRESS:
> ch = qemu_get_byte(f);
> ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 7f8d799..38163ca 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -544,11 +544,14 @@
> # been migrated, pulling the remaining pages along as needed. NOTE:
> If
> # the migration fails during postcopy the VM will fail. (since 2.6)
> #
> +# @skip-balloon: Skip scanning ram pages released by virtio-balloon driver.
> +# (since 2.7)
> +#
> # Since: 1.2
> ##
> { 'enum': 'MigrationCapability',
> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
> - 'compress', 'events', 'postcopy-ram'] }
> + 'compress', 'events', 'postcopy-ram', 'skip-balloon'] }
>
> ##
> # @MigrationCapabilityStatus
> --
> 1.8.3.1