[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom |
Date: |
Mon, 15 Jun 2015 11:06:02 +0200 |
On Mon, Jun 15, 2015 at 09:01:53AM +0200, Christian Borntraeger wrote:
> Am 13.06.2015 um 22:10 schrieb Michael S. Tsirkin:
> > On Fri, Jun 12, 2015 at 01:56:37PM +0200, Christian Borntraeger wrote:
> >> Am 10.06.2015 um 15:13 schrieb Michael S. Tsirkin:
> >>> On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote:
> >>>> On 09/06/15 13:37, Christian Borntraeger wrote:
> >>>>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev:
> >>>>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer,
> >>>>>> when Linux is under severe memory pressure. Various mechanisms are
> >>>>>> responsible for correct virtio_balloon memory management. Nevertheless
> >>>>>> it
> >>>>>> is often the case that these control tools does not have enough time to
> >>>>>> react on fast changing memory load. As a result OS runs out of memory
> >>>>>> and
> >>>>>> invokes OOM-killer. The balancing of memory by use of the virtio
> >>>>>> balloon
> >>>>>> should not cause the termination of processes while there are pages in
> >>>>>> the
> >>>>>> balloon. Now there is no way for virtio balloon driver to free memory
> >>>>>> at
> >>>>>> the last moment before some process get killed by OOM-killer.
> >>>>>>
> >>>>>> This does not provide a security breach as balloon itself is running
> >>>>>> inside Guest OS and is working in the cooperation with the host. Thus
> >>>>>> some improvements from Guest side should be considered as normal.
> >>>>>>
> >>>>>> To solve the problem, introduce a virtio_balloon callback which is
> >>>>>> expected to be called from the oom notifier call chain in
> >>>>>> out_of_memory()
> >>>>>> function. If virtio balloon could release some memory, it will make the
> >>>>>> system return and retry the allocation that forced the out of memory
> >>>>>> killer to run.
> >>>>>>
> >>>>>> This behavior should be enabled if and only if appropriate feature bit
> >>>>>> is set on the device. It is off by default.
> >>>>> The balloon frees pages in this way
> >>>>>
> >>>>> static void balloon_page(void *addr, int deflate)
> >>>>> {
> >>>>> #if defined(__linux__)
> >>>>> if (!kvm_enabled() || kvm_has_sync_mmu())
> >>>>> qemu_madvise(addr, TARGET_PAGE_SIZE,
> >>>>> deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> >>>>> #endif
> >>>>> }
> >>>>>
> >>>>> The guest can re-touch that page and get a empty zero or the old page
> >>>>> back without
> >>>>> tampering the host integrity. This should work for all cases I am aware
> >>>>> of (without sync_mmu its a nop anyway) so why not enable that by
> >>>>> default? Anything that I missed?
> >>>>>
> >>>>> Christian
> >>>>
> >>>> I'd like to do that :) Actually original version of kernel patch
> >>>> has enabled this unconditionally. But Michael asked to make
> >>>> it configurable and off by default.
> >>>>
> >>>> Den
> >>>
> >>> That's not the question here. The question is why is it limited by
> >>> kvm_has_sync_mmu.
> >>
> >> Well we have two interesting options here:
> >>
> >> VIRTIO_BALLOON_F_MUST_TELL_HOST and VIRTIO_BALLOON_F_DEFLATE_ON_OOM
> >>
> >> For any sane host with ondemand paging just re-accessing the page
> >> should simply work. So the common case could be
> >> VIRTIO_BALLOON_F_MUST_TELL_HOST == off
> >
> > Disabling this breaks useful optimizations such as
> > ability not to migrate memory in the balloon.
>
> memory in the balloon is usually backed by the empty zero page after
> the madvise (WONT_NEED will finally result in zap_pte_range for the
> common case). In a ideal world migration should be able to optimize
> zero pages.
This still involves reading them in as opposed to just skipping them.
>
> >> VIRTIO_BALLOON_F_DEFLATE_ON_OOM == on
> >
> > AFAIK management tools depend on balloon not deflating
> > below host-specified threshold to avoid OOM on the host.
> > So I don't think we can make this a default,
> > management needs to enable this explicitly.
>
> If the ballooning is required to keep the host memory managedment
> from OOM - iow abusing ballooning as memory hotplug between guests
> then yes better let the guest oom - that makes sense.
>
> Now: I think that doing so (not having enough swap in the host if
> all guests deflate) and relying on balloon semantics is fundamentally
> broken. Let me explain this: The problem is that we rely on guest
> cooperation for the host integrity. As I explained using madvise
> WONT_NEED will replace the current PTEs with invalid/emtpy PTEs. As
> soon as the guest kernel re-touches the page (e.g. a malicious
> kernel module - not the balloon driver) it will be backed by the VMAs
> default method - so usually with a shared R/O copy of the empty
> zero page. Write accesses will result in a copy-on-write and allocate
> new memory in the host.
> There is nothing we can do in the balloon protocol to protect the host
> against malicious guests allocating all the maximum memory.
If we want to try and harden host, we can unmap it so guest will crash
if it touches pages without deflate.
> If you need host integrity against guest memory usage, something like
> cgroups_memory or so is probably the only reliable way.
In the original design, protection against a malicious guest is not the
point of the balloon, it's a technology that let you overcommit
cooperative guests.
> >
> >> Only for the rare case of hypervisors without paging or other memory
> >> related restrictions we have to enable MUST_TELL_HOST.
> >> Now: QEMU knows exactly which case we have, so why not let QEMU tell
> >> the guest what the capabilities are. (e.g. sync_mmu ---> no need to
> >> tell the host).
> >>
> >> I can at least imaging that some admin wants to make the the oom case
> >> configurable, but a sane default seems to be to not kill random
> >> guest processes.
> >>
> >> Christian
> >
> >
- [Qemu-devel] [PATCH v6 0/1] balloon: add a feature bit to let Guest OS deflate, Denis V. Lunev, 2015/06/09
- [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Denis V. Lunev, 2015/06/09
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Christian Borntraeger, 2015/06/09
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Denis V. Lunev, 2015/06/10
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Michael S. Tsirkin, 2015/06/10
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Denis V. Lunev, 2015/06/10
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Christian Borntraeger, 2015/06/12
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Michael S. Tsirkin, 2015/06/13
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Christian Borntraeger, 2015/06/15
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom,
Michael S. Tsirkin <=
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Christian Borntraeger, 2015/06/15
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Michael S. Tsirkin, 2015/06/15
- Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Christian Borntraeger, 2015/06/15
Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom, Denis V. Lunev, 2015/06/09