qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [Qemu-devel] [PATCH for-4.0?] exec: Only count mapped m


From: David Gibson
Subject: Re: [qemu-s390x] [Qemu-devel] [PATCH for-4.0?] exec: Only count mapped memory backends for qemu_getrampagesize()
Date: Thu, 28 Mar 2019 11:27:26 +1100
User-agent: Mutt/1.11.3 (2019-02-01)

On Wed, Mar 27, 2019 at 02:19:41PM +0100, David Hildenbrand wrote:
> On 27.03.19 10:03, David Gibson wrote:
> > On Wed, Mar 27, 2019 at 09:10:01AM +0100, David Hildenbrand wrote:
> >> On 27.03.19 01:12, David Gibson wrote:
> >>> On Tue, Mar 26, 2019 at 06:02:51PM +0100, David Hildenbrand wrote:
> >>>> On 26.03.19 15:08, Igor Mammedov wrote:
> >>>>> On Tue, 26 Mar 2019 14:50:58 +1100
> >>>>> David Gibson <address@hidden> wrote:
> >>>>>
> >>>>>> qemu_getrampagesize() works out the minimum host page size backing any 
> >>>>>> of
> >>>>>> guest RAM.  This is required in a few places, such as for POWER8 PAPR 
> >>>>>> KVM
> >>>>>> guests, because limitations of the hardware virtualization mean the 
> >>>>>> guest
> >>>>>> can't use pagesizes larger than the host pages backing its memory.
> >>>>>>
> >>>>>> However, it currently checks against *every* memory backend, whether 
> >>>>>> or not
> >>>>>> it is actually mapped into guest memory at the moment.  This is 
> >>>>>> incorrect.
> >>>>>>
> >>>>>> This can cause a problem attempting to add memory to a POWER8 pseries 
> >>>>>> KVM
> >>>>>> guest which is configured to allow hugepages in the guest (e.g.
> >>>>>> -machine cap-hpt-max-page-size=16m).  If you attempt to add 
> >>>>>> non-hugepage,
> >>>>>> you can (correctly) create a memory backend, however it (correctly) 
> >>>>>> will
> >>>>>> throw an error when you attempt to map that memory into the guest by
> >>>>>> 'device_add'ing a pc-dimm.
> >>>>>>
> >>>>>> What's not correct is that if you then reset the guest a startup check
> >>>>>> against qemu_getrampagesize() will cause a fatal error because of the 
> >>>>>> new
> >>>>>> memory object, even though it's not mapped into the guest.
> >>>>> I'd say that backend should be remove by mgmt app since device_add 
> >>>>> failed
> >>>>> instead of leaving it to hang around. (but fatal error either not a nice
> >>>>> behavior on QEMU part)
> >>>>
> >>>> Indeed, it should be removed. Depending on the options (huge pages with
> >>>> prealloc?) memory might be consumed for unused memory. Undesired.
> >>>
> >>> Right, but if the guest initiates a reboot before the management gets
> >>> to that, we'll have a crash.
> >>>
> >>
> >> Yes, I agree.
> >>
> >> At least on s390x (extending on what Igor said):
> >>
> >> mc->init() -> s390_memory_init() ->
> >> memory_region_allocate_system_memory() -> host_memory_backend_set_mapped()
> >>
> >>
> >> ac->init_machine() -> kvm_arch_init() ->
> >> kvm_s390_configure_mempath_backing() -> qemu_getrampagesize()
> >>
> >>
> >> And in vl.c
> >>
> >> configure_accelerator(current_machine, argv[0]);
> >> ...
> >> machine_run_board_init()
> >>
> >> So memory is indeed not mapped before calling qemu_getrampagesize().
> >>
> >>
> >> We *could* move the call to kvm_s390_configure_mempath_backing() to
> >> s390_memory_init().
> >>
> >> cap_hpage_1m is not needed before we create VCPUs, so this would work fine.
> >>
> >> We could than eventually make qemu_getrampagesize() asssert if no
> >> backends are mapped at all, to catch other user that rely on this being
> >> correct.
> > 
> > So.. I had a look at the usage in kvm_s390_configure_mempath_backing()
> > and I'm pretty sure it's broken.  It will work in the case where
> > there's only one backend.  And if that's the default -mem-path rather
> > than an explicit memory backend then my patch won't break it any
> > further.
> 
> It works for the current scenarios, where you only have one (maximum
> two) backings of the same kind. Your patch would break that.

Actually it wouldn't.  My patch only affects checking of explicit
backend objects - checking of the base -mem-path implicit backend
remains the same.

> > qemu_getrampagesize() returns the smallest host page size for any
> > memory backend.  That's what matters for pcc KVM (in several cases)
> > because we need certain things to be host-contiguous, not just
> > guest-contiguous.  Bigger host page sizes are fine for that purpose,
> > clearly.
> > 
> > AFAICT on s390 you're looking to determine if any backend is using
> > hugepages, because KVM may not support that.  The minimum host page
> > size isn't adequate to determine that, so qemu_getrampagesize() won't
> > tell you what you need.
> 
> Well, as long as we don't support DIMMS or anything like that it works
> perfectly fine. But yes it is far from beautiful.
> 
> First of all, I'll prepare a patch to do the call from a different
> context. Then we can fine tune to using something else than
> qemu_getrampagesize()
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]