On Wed, Jun 21, 2023 at 06:17:37PM +0200, David Hildenbrand wrote:
As documented, ram_block_discard_range() guarantees two things
a) Read 0 after discarding succeeded
b) Make postcopy work by triggering a fault on next access
And if we'd simply want to drop the FALLOC_FL_PUNCH_HOLE:
1) For hugetlb, only newer kernels support MADV_DONTNEED. So there is no way
to just discard in a private mapping here that works for kernels we still
care about.
2) free-page-reporting wants to read 0's when re-accessing discarded memory.
If there is still something there in the file, that won't work.
Ah right. The semantics is indeed slightly different..
IMHO, ideally here we need a zero page installed as private, ignoring the
page cache underneath, freeing any possible private page. But I just don't
know how to do that easily with current default mm infrastructures, or
free-page-reporting over private mem seems just won't really work at all,
it seems to me.
Maybe.. UFFDIO_ZEROPAGE would work? We need uffd registered by default, but
that's slightly tricky.
3) Regarding postcopy on MAP_PRIVATE shmem, I am not sure if it will
actually do what you want if the pagecache holds a page. Maybe it works, but
I am not so sure. Needs investigation.
For MINOR I think it will. I actually already implemented some of that (I
think, all of that is required) in the HGM qemu rfc series, and smoked it a
bit without any known issue yet with the HGM kernel.
IIUC we can work on MINOR support without HGM; I can separate it out. It's
really a matter of whether it'll be worthwhile the effort and time.