[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving"
From: |
Zhijian Li (Fujitsu) |
Subject: |
Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase |
Date: |
Fri, 15 Sep 2023 03:13:21 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 |
On 15/09/2023 04:20, “William Roche wrote:
> From: William Roche <william.roche@oracle.com>
>
> A memory page poisoned from the hypervisor level is no longer readable.
> Thus, it is now treated as a zero-page for the ram saving migration phase.
>
> The migration of a VM will crash Qemu when it tries to read the
> memory address space and stumbles on the poisoned page with a similar
> stack trace:
>
> Program terminated with signal SIGBUS, Bus error.
> #0 _mm256_loadu_si256
> #1 buffer_zero_avx2
> #2 select_accel_fn
> #3 buffer_is_zero
> #4 save_zero_page_to_file
> #5 save_zero_page
> #6 ram_save_target_page_legacy
> #7 ram_save_host_page
> #8 ram_find_and_save_block
> #9 ram_save_iterate
> #10 qemu_savevm_state_iterate
> #11 migration_iteration_run
> #12 migration_thread
> #13 qemu_thread_start
>
> Fix it by considering poisoned pages as if they were zero-pages for
> the migration copy. This fix also works with underlying large pages,
> taking into account the RAMBlock segment "page-size".
>
> Standard migration and compressed transfers are handled by this code.
> RDMA transfer isn't touched.
>
I'm okay with "RDMA isn't touched".
BTW, could you share your reproducing program/hacking to poison the page, so
that
i am able to take a look the RDMA part later when i'm free.
Not sure it's suitable to acknowledge a not touched part. Anyway
Acked-by: Li Zhijian <lizhijian@fujitsu.com> # RDMA
> Signed-off-by: William Roche <william.roche@oracle.com>
> ---
> accel/kvm/kvm-all.c | 14 ++++++++++++++
> accel/stubs/kvm-stub.c | 5 +++++
> include/sysemu/kvm.h | 10 ++++++++++
> migration/ram-compress.c | 3 ++-
> migration/ram.c | 23 +++++++++++++++++++++--
> migration/ram.h | 2 ++
> 6 files changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index ff1578bb32..7fb13c8a56 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1152,6 +1152,20 @@ static void kvm_unpoison_all(void *param)
> }
> }
>
> +bool kvm_hwpoisoned_page(RAMBlock *block, void *offset)
> +{
> + HWPoisonPage *pg;
> + ram_addr_t ram_addr = (ram_addr_t) offset;
> +
> + QLIST_FOREACH(pg, &hwpoison_page_list, list) {
> + if ((ram_addr >= pg->ram_addr) &&
> + (ram_addr - pg->ram_addr < block->page_size)) {
> + return true;
> + }
> + }
> + return false;
> +}
> +
> void kvm_hwpoison_page_add(ram_addr_t ram_addr)
> {
> HWPoisonPage *page;
> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
> index 235dc661bc..c0a31611df 100644
> --- a/accel/stubs/kvm-stub.c
> +++ b/accel/stubs/kvm-stub.c
> @@ -133,3 +133,8 @@ uint32_t kvm_dirty_ring_size(void)
> {
> return 0;
> }
> +
> +bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr)
> +{
> + return false;
> +}
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index ee9025f8e9..858688227a 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -570,4 +570,14 @@ bool kvm_arch_cpu_check_are_resettable(void);
> bool kvm_dirty_ring_enabled(void);
>
> uint32_t kvm_dirty_ring_size(void);
> +
> +/**
> + * kvm_hwpoisoned_page - indicate if the given page is poisoned
> + * @block: memory block of the given page
> + * @ram_addr: offset of the page
> + *
> + * Returns: true: page is poisoned
> + * false: page not yet poisoned
> + */
> +bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr);
> #endif
> diff --git a/migration/ram-compress.c b/migration/ram-compress.c
> index 06254d8c69..1916ce709d 100644
> --- a/migration/ram-compress.c
> +++ b/migration/ram-compress.c
> @@ -34,6 +34,7 @@
> #include "qemu/error-report.h"
> #include "migration.h"
> #include "options.h"
> +#include "ram.h"
> #include "io/channel-null.h"
> #include "exec/target_page.h"
> #include "exec/ramblock.h"
> @@ -198,7 +199,7 @@ static CompressResult do_compress_ram_page(QEMUFile *f,
> z_stream *stream,
>
> assert(qemu_file_buffer_empty(f));
>
> - if (buffer_is_zero(p, page_size)) {
> + if (migration_buffer_is_zero(block, offset, page_size)) {
> return RES_ZEROPAGE;
> }
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 9040d66e61..fd337f7e65 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1129,6 +1129,26 @@ void ram_release_page(const char *rbname, uint64_t
> offset)
> ram_discard_range(rbname, offset, TARGET_PAGE_SIZE);
> }
>
> +/**
> + * migration_buffer_is_zero: indicate if the page at the given
> + * location is entirely filled with zero, or is a poisoned page.
> + *
> + * @block: block that contains the page
> + * @offset: offset inside the block for the page
> + * @len: size to consider
> + */
> +bool migration_buffer_is_zero(RAMBlock *block, ram_addr_t offset,
> + size_t len)
> +{
> + uint8_t *p = block->host + offset;
> +
> + if (kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) {
> + return true;
> + }
> +
> + return buffer_is_zero(p, len);
> +}
> +
> /**
> * save_zero_page_to_file: send the zero page to the file
> *
> @@ -1142,10 +1162,9 @@ void ram_release_page(const char *rbname, uint64_t
> offset)
> static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file,
> RAMBlock *block, ram_addr_t offset)
> {
> - uint8_t *p = block->host + offset;
> int len = 0;
>
> - if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> + if (migration_buffer_is_zero(block, offset, TARGET_PAGE_SIZE)) {
> len += save_page_header(pss, file, block, offset |
> RAM_SAVE_FLAG_ZERO);
> qemu_put_byte(file, 0);
> len += 1;
> diff --git a/migration/ram.h b/migration/ram.h
> index 145c915ca7..805ea2a211 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -65,6 +65,8 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t
> size);
> void ram_transferred_add(uint64_t bytes);
> void ram_release_page(const char *rbname, uint64_t offset);
>
> +bool migration_buffer_is_zero(RAMBlock *block, ram_addr_t offset, size_t
> len);
> +
> int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
> bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t
> byte_offset);
> void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
- [PATCH 0/1] Qemu crashes on VM migration after an handled memory error, “William Roche, 2023/09/06
- [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase, “William Roche, 2023/09/06
- Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase, Joao Martins, 2023/09/06
- Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase, Peter Xu, 2023/09/06
- Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase, William Roche, 2023/09/06
- Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase, Joao Martins, 2023/09/09
- Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase, Peter Xu, 2023/09/11
- Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase, Peter Xu, 2023/09/12
- [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error, “William Roche, 2023/09/14
- [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase, “William Roche, 2023/09/14
- Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase,
Zhijian Li (Fujitsu) <=
- Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase, William Roche, 2023/09/15
- Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase, Zhijian Li (Fujitsu), 2023/09/17
- Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase, Zhijian Li (Fujitsu), 2023/09/20
- Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase, William Roche, 2023/09/20
- [PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error, “William Roche, 2023/09/20
- [PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase, “William Roche, 2023/09/20
- Re: [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error, Peter Xu, 2023/09/14