|
From: | William Roche |
Subject: | Re: [PATCH v2 2/7] system/physmem: poisoned memory discard on reboot |
Date: | Tue, 12 Nov 2024 19:17:31 +0100 |
User-agent: | Mozilla Thunderbird |
On 11/12/24 12:07, David Hildenbrand wrote:
On 07.11.24 11:21, “William Roche wrote:From: William Roche <william.roche@oracle.com> We take into account the recorded page sizes to repair the memory locations, calling ram_block_discard_range() to punch a hole in the backend file when necessary and regenerate a usable memory. Fall back to unmap/remap the memory location(s) if the kernel doesn't support the madvise calls used by ram_block_discard_range(). Hugetlbfs poison case is also taken into account as a hole punch with fallocate will reload a new page when first touched. Signed-off-by: William Roche <william.roche@oracle.com> --- system/physmem.c | 50 +++++++++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 20 deletions(-) diff --git a/system/physmem.c b/system/physmem.c index 750604d47d..dfea120cc5 100644 --- a/system/physmem.c +++ b/system/physmem.c@@ -2197,27 +2197,37 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)} else if (xen_enabled()) { abort(); } else { - flags = MAP_FIXED; - flags |= block->flags & RAM_SHARED ? - MAP_SHARED : MAP_PRIVATE;- flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;- prot = PROT_READ; - prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE; - if (block->fd >= 0) { - area = mmap(vaddr, length, prot, flags, block->fd, - offset + block->fd_offset); - } else { - flags |= MAP_ANONYMOUS; - area = mmap(vaddr, length, prot, flags, -1, 0); - } - if (area != vaddr) { - error_report("Could not remap addr: " - RAM_ADDR_FMT "@" RAM_ADDR_FMT "", - length, addr); - exit(1);+ if (ram_block_discard_range(block, offset + block- >fd_offset,+ length) != 0) { + if (length > TARGET_PAGE_SIZE) { + /* punch hole is mandatory on hugetlbfs */+ error_report("large page recovery failure addr: "+ RAM_ADDR_FMT "@" RAM_ADDR_FMT "", + length, addr); + exit(1); + }For shared memory we really need it.Private file-backed is weird ... because we don't know if the shared or the private page is problematic ... :(
I agree with you, and we have to decide when should we bail out if ram_block_discard_range() doesn't work. According to me, if discard doesn't work and we are dealing with file-backed largepages (shared or not) we have to exit, because the fallocate is mandatory. It is the case with hugetlbfs.
In the non-file-backed case, or the file-backed non-largepage private case, according to me we can trust the mmap() method to put everything back in place for the VM reset to work as expected. Are there aspects I don't see, and for which mmap + the remap handler is not sufficient and we should also bail out here ?
Maybe we should just do: if (block->fd >= 0) { /* mmap(MAP_FIXED) cannot reliably zap our problematic page. */ error_report(...); exit(-1); } Or alternatively if (block->fd >= 0 && qemu_ram_is_shared(block)) { /* mmap() cannot possibly zap our problematic page. */ error_report(...); exit(-1); } else if (block->fd >= 0) { /* * MAP_PRIVATE file-backed ... mmap() can only zap the private * page, not the shared one ... we don't know which one is * problematic. */ warn_report(...); }
I also agree that any file-backed/shared case should bail out if discard (fallocate) fails, no mater large or standard pages are used.
In the case of file-backed private standard pages, I think that a poison on the private page can be fixed with a new mmap. According to me, there are 2 cases to consider: at the moment the poison is seen, the page was dirty (so it means that it was a pure private page), or the page was not dirty, and in this case the poison could replace this non-dirty page with a new copy of the file content.
In both cases, I'd say that the remap should clean up the poison. So the conditions when discard fails, could be something like: if (block->fd >= 0 && (qemu_ram_is_shared(block) || (length > TARGET_PAGE_SIZE))) { /* punch hole is mandatory, mmap() cannot possibly zap our page*/ error_report("%spage recovery failure addr: " RAM_ADDR_FMT "@" RAM_ADDR_FMT "", (length > TARGET_PAGE_SIZE) ? "large " : "", length, addr); exit(1); }
+ flags = MAP_FIXED; + flags |= block->flags & RAM_SHARED ? + MAP_SHARED : MAP_PRIVATE;+ flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;+ prot = PROT_READ;+ prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;+ if (block->fd >= 0) {+ area = mmap(vaddr, length, prot, flags, block->fd,+ offset + block->fd_offset); + } else { + flags |= MAP_ANONYMOUS; + area = mmap(vaddr, length, prot, flags, -1, 0); + } + if (area != vaddr) { + error_report("Could not remap addr: " + RAM_ADDR_FMT "@" RAM_ADDR_FMT "", + length, addr); + exit(1); + } + memory_try_enable_merging(vaddr, length); + qemu_ram_setup_dump(vaddr, length);Can we factor the mmap hack out into a separate helper function to clean this up a bit?
Sure, I'll do that.
[Prev in Thread] | Current Thread | [Next in Thread] |