qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 1/4] softmmu/physmem: Warn with ram_block_discard_range()


From: David Hildenbrand
Subject: Re: [PATCH v1 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
Date: Wed, 21 Jun 2023 18:17:37 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0

On 21.06.23 17:55, Peter Xu wrote:
On Tue, Jun 20, 2023 at 03:03:51PM +0200, David Hildenbrand wrote:
ram_block_discard_range() cannot possibly do the right thing in
MAP_PRIVATE file mappings in the general case.

To achieve the documented semantics, we also have to punch a hole into
the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
of such a file.

For example, using VM templating -- see commit b17fbbe55cba ("migration:
allow private destination ram with x-ignore-shared") -- in combination with
any mechanism that relies on discarding of RAM is problematic. This
includes:
* Postcopy live migration
* virtio-balloon inflation/deflation or free-page-reporting
* virtio-mem

So at least warn that there is something possibly dangerous is going on
when using ram_block_discard_range() in these cases.

The issue is probably valid.

One thing I worry is when the user (or, qemu instance) exclusively owns the
file, just forgot to attach share=on, where it used to work perfectly then
it'll show this warning.  But I agree maybe it's good to remind them just
to attach the share=on.

For memory-backend-memfd "share=on" is fortunately the default. For memory-backend-file it isn't (and in most cases you do want share=on, like for hugetlbfs or tmpfs).

Missing the "share=on" for memory-backend-file can have sane use cases, but for the common /dev/shm/ case it even results in an undesired double-memory consumption (just like memory-backend-memfd,share=off).



For real private mem users, the warning can of real help, one should
probably leverage things like file snapshot provided by modern file
systems, so each VM should just have its own snapshot ram file to use then
map it share=on I suppose.

Yes, I agree. Although we recently learned that fs-backed VM RAM (SSD) performs poorly and will severely wear your SSD severly :(


For the long term, maybe we should simply support private mem here simply
by a MADV_DONTNEED.  I assume that's the right semantics for postcopy (just
need to support MINOR faults, though; MISSING faults definitely will stop
working.. but for all the rest framework shouldn't need much change), and I
hope that's also the semantics that balloon/virtio-mem wants here.  Not
sure whether/when that's strongly needed, assuming the corner case above
can still be work arounded properly by other means.

I briefly thought about that but came to the conclusion that fixing it is not that easy. So I went with the warn.

As documented, ram_block_discard_range() guarantees two things

a) Read 0 after discarding succeeded
b) Make postcopy work by triggering a fault on next access

And if we'd simply want to drop the FALLOC_FL_PUNCH_HOLE:

1) For hugetlb, only newer kernels support MADV_DONTNEED. So there is no way to just discard in a private mapping here that works for kernels we still care about.

2) free-page-reporting wants to read 0's when re-accessing discarded memory. If there is still something there in the file, that won't work.

3) Regarding postcopy on MAP_PRIVATE shmem, I am not sure if it will actually do what you want if the pagecache holds a page. Maybe it works, but I am not so sure. Needs investigation.



For now, a warning looks all sane.


Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Peter Xu <peterx@redhat.com>

Thanks!

--
Cheers,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]