[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/dis
From: |
Vitaly Kuznetsov |
Subject: |
Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/disablement |
Date: |
Mon, 01 Mar 2021 17:22:37 +0100 |
Igor Mammedov <imammedo@redhat.com> writes:
> On Wed, 24 Feb 2021 18:00:43 +0100
> Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
>> Igor Mammedov <imammedo@redhat.com> writes:
>>
>> > On Tue, 23 Feb 2021 19:08:42 +0100
>> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >
>> >> Igor Mammedov <imammedo@redhat.com> writes:
>> >>
>> >> > On Tue, 23 Feb 2021 16:46:50 +0100
>> >> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >> >
>> >> >> Igor Mammedov <imammedo@redhat.com> writes:
>> >> >>
>> >> >> > On Mon, 22 Feb 2021 11:20:34 +0100
>> >> >> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >> >> >
>> >> >> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>> >> >> >>
>> >> >> >> > Igor Mammedov <imammedo@redhat.com> writes:
>> >> >> >> >
>> >> >> >> >>>
>> >> >> >> >>> We need to distinguish because that would be sane.
>> >> >> >> >>>
>> >> >> >> >>> Enlightened VMCS is an extension to VMX, it can't be used
>> >> >> >> >>> without
>> >> >> >> >>> it. Genuine Hyper-V doesn't have a knob for enabling and
>> >> >> >> >>> disabling it,
>> >> >> >> >> ...
>> >> >> >> >>> That bein said, if
>> >> >> >> >>> guest CPU lacks VMX it is counter-productive to expose EVMCS.
>> >> >> >> >>> However,
>> >> >> >> >>> there is a problem with explicit enablement: what should
>> >> >> >> >>>
>> >> >> >> >>> 'hv-passthrough,hv-evmcs' option do? Just silently drop EVMCS?
>> >> >> >> >>> Doesn't
>> >> >> >> >>> sound sane to me.
>> >> >> >> >> based on above I'd error out is user asks for unsupported option
>> >> >> >> >> i.e. no VMX -> no hv-evmcs - if explicitly asked -> error out
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > That's what I keep telling you but you don't seem to listen.
>> >> >> >> > 'Scratch
>> >> >> >> > CPU' can't possibly help with this use-case because when you
>> >> >> >> > parse
>> >> >> >> >
>> >> >> >> > 'hv-passthrough,hv-evmcs,vmx=off' you
>> >> >> >> >
>> >> >> >> > 1) "hv-passthrough" -> set EVMCS bit to '1' as it is supported by
>> >> >> >> > the
>> >> >> >> > host.
>> >> >> >> >
>> >> >> >> > 2) 'hv-evmcs' -> keep EVMCS bit '1'
>> >> >> >> >
>> >> >> >> > 3) 'vmx=off' -> you have no idea where EVMCS bit came from.
>> >> >> >> >
>> >> >> >> > We have to remember which options were aquired from the host and
>> >> >> >> > which
>> >> >> >> > were set explicitly by the user.
>> >> >> >>
>> >> >> >> Igor,
>> >> >> >>
>> >> >> >> could you please comment on the above? In case my line of thought is
>> >> >> >> correct, and it is impossible to distinguish between e.g.
>> >> >> >>
>> >> >> >> 'hv-passthrough,hv-evmcs,-vmx'
>> >> >> >> and
>> >> >> >> 'hv-passthrough,-vmx'
>> >> >> >>
>> >> >> >> without a custom parser (written just exactly the way I did in this
>> >> >> >> version, for example) regardless of when 'hv-passthrough' is
>> >> >> >> expanded. E.g. we have the exact same problem with
>> >> >> >> 'hv-default,hv-evmcs,-vmx'. I that case I see no point in
>> >> >> >> discussing
>> >> >> >
>> >> >> > right, if we need to distinguish between explicit and implicit
>> >> >> > hv-evmcs set by
>> >> >> > hv-passthrough custom parser probably the way to go.
>> >> >> >
>> >> >> > However do we need actually need to do it?
>> >> >>
>> >> >> I think we really need that. See below ...
>> >> >>
>> >> >> > I'd treat 'hv-passthrough,-vmx' the same way as
>> >> >> > 'hv-passthrough,hv-evmcs,-vmx'
>> >> >> > and it applies not only hv-evmcs but other features hv-passthrough
>> >> >> > might set
>> >> >> > (i.e. if whatever was [un]set by hv-passthrough in combination with
>> >> >> > other
>> >> >> > features results in invalid config, QEMU shall error out instead of
>> >> >> > magically
>> >> >> > altering host provided hv-passthrough value).
>> >> >> >
>> >> >> > something like:
>> >> >> > 'hv-passthrough,-vmx' when hv-passthrough makes hv-evmcs bit set
>> >> >> > should result in
>> >> >> > error_setg(errp,"'vmx' feature can't be disabled when hv-evmcs is
>> >> >> > enabled,"
>> >> >> > " either enable 'vmx' or disable 'hv-evmcs' along
>> >> >> > with disabling 'vmx'"
>> >> >> >
>> >> >> > making host's features set, *magically* mutable, depending on other
>> >> >> > user provided features
>> >> >> > is a bit confusing. One would never know what hv-passthrough
>> >> >> > actually means, and if
>> >> >> > enabling/disabling 'random' feature changes it.
>> >> >> >
>> >> >> > It's cleaner to do just what user asked (whether implicitly or
>> >> >> > explicitly) and error out
>> >> >> > in case it ends up in nonsense configuration.
>> >> >> >
>> >> >>
>> >> >> I don't seem to agree this is a sane behavior, especially if you
>> >> >> replace
>> >> >> 'hv-passthrough' with 'hv-default' above. Removing 'vmx' from CPU for
>> >> >> Windows guests is common if you'd want to avoid nested configuration:
>> >> >> even without any Hyper-V guests created, Windows itself is a Hyper-V
>> >> >> partition.
>> >> >>
>> >> >> So a sane user will do:
>> >> >>
>> >> >> '-cpu host,hv-default,vmx=off'
>> >> >>
>> >> >> and on Intel he will get an error, and on AMD he won't.
>> >> >>
>> >> >> So what you're suggesting actually defeats the whole purpose of
>> >> >> 'hv-default' as upper-layer tools (think libvirt) will need to know
>> >> >> that
>> >> > I'd assume it would be hard for libvirt to use 'hv-default' from
>> >> > migration
>> >> > point of view. It's semi opaque (one can find out what features it sets
>> >> > indirectly inspecting individual hv_foo features, and mgmt will need to
>> >> > know about them). If it will mutate when other features [un]set, upper
>> >> > layers might need to enumerate all these permutations to know which
>> >> > hosts
>> >> > are compatible or compare host feature sets every time before attempting
>> >> > migration.
>> >>
>> >> That's exactly the opposite of what's the goal here which is: make it
>> >> possible for upper layers to not know anything about Hyper-V
>> >> enlightenments besides 'hv-default'. Migration should work just fine, if
>> >> the rest of guest configuration matches -- then 'hv-default' will create
>> >> the exact same things (e.g. if 'vmx' was disabled on the source it has
>> > ^^^^^
>> > I'm not convinced in that yet (not with current impl. more on that at the
>> > end of reply)
>> >
>> >> to be enabled on the destination, it can't be different)
>> >>
>> >>
>> >> >> Intel configurations for Windows guests are somewhat different. They'll
>> >> >> need to know what 'hv-evmcs' is. We're back to where we've started.
>> >> >
>> >> > we were talking about hv-passthrough, and if host advertises hv-evmcs
>> >> > QEMU should complain if user disabled features it depends on (
>> >> > not silently fixing up configuration error).
>> >> > But the same applies to hv-default.
>> >>
>> >> Let's forget about hv-passthrough completely for a while as this series
>> >> is kind of unrelated to it.
>> >
>> > It adds a lot for unrelated code (not just couple of lines),
>> > I've played with scratch CPU idea, here is demo of it
>> > https://github.com/imammedo/qemu/commit/a4b107d5368ebf72d45082bc8310a6b88a4ba6fb
>> > I didn't rework caps/cpuid querying parts (just hacked around it),
>> > and even without that it saves us ~200LOC (not a small part of which comes
>> > with this series).
>>
>> All your savings come from throwing away custom parsers -- which are not
>> needed at all if we don't distinguish between
>>
>> 'hv-default,hv-evmcs' and 'hv-default'
>>
>> it's just not needed, don't count these patches in. Or, if it is needed,
>> please explain how your scratch CPU is making things different. I guess
>> it is not so we can discuss this outside of this series.
>
> scratch CPU helps with hostpassthrough refactoring which also brings in
> dependency on custom parser.
I think I already said that but let me repeat myself: scratch CPUs don't
help us in any way to distinguish between
'hv-passthrough,hv-evmcs,vmx=off'
and
'hv-passthrough,vmx=off'
and thus are utterly useless (for this particular feature). It's not
'refactoring', it's throwing a feature away for the sake of code
purity.
>
>
>> > I also split horrible hv_cpuid_check_and_set into separate 'set' and
>> > 'check' stages.
>> > Granted it was sort-of pre-existing ugly code, some of your
>> > re-factoring made it a bit better but it's still far from readable.
>> >
>>
>> hv_cpuid_check_and_set() is already there, I'm not at all opposed to
>> making this code even better but I don't see it as a must for this
>> particular feature (hv-default).
> it's not a must have for hv-default especially if you don't touch it.
> (however if you touch it & co, I'd ask to clean it up first)
>
>
>> >> In the previous submission I was setting 'hv-default' based on host
>> >> availability of the feature only. That is: set on Intel, unset on
>> >> AMD. We have to at least preserve that because it would be insane to
>> >> crash on
>> >>
>> >> -cpu host,hv-default
>> >>
>> >> on AMD because AMD doesn't (and never will!) support hv-evmcs, right?
>> >
>> > If QEMU prevents cross arch migration i.e. it's not supported,
>> > then I guess we can make hv-default different depending on AMD or Intel
>> > host.
>> > If not then we might need to be conservative i.e. exclude hv-evmcs from
>> > defaults.
>> >
>>
>> Forget about cross vendor. I want to tie it 1:1 to VMX feature on the
>> guest CPU -- which happens to be only available on Intel. It is
>> absolutely impossible to migrate VMX enabled guest to VMX-disabled
>> destination, with or without evmcs.
> Ok
>
>> >> >> If we are to follow this approach let's just throw away 'hv-evmcs' from
>> >> >> 'hv-default' set, it's going to be much cleaner. But again, I don't
>> >> >> really believe it's the right way to go.
>> >> >
>> >> > if desired behavior, on Intel host for above config, to start without
>> >> > error
>> >> > then indeed defaults should not set 'hv-evmcs' if it results in invalid
>> >> > feature set.
>> >>
>> >> This is problematic as it is still sane for everyone to enable it as it
>> >> gives performance advantage. If we just for a second forget about custom
>> > "
>> > > >> So a sane user will do:
>> > > >>
>> > > >> '-cpu host,hv-default,vmx=off'
>> > "
>> > it's not easy picking defaults.
>> >
>> >> parsers and all that -- which is just an implementation detail, why can't
>> >> we tie 'hv-evmcs' bit in 'hv-default' to 'vxm' 1:1?
>> > migration wise I don't see issues wrt vmx=off turning of hv-evmcs,
>> > however ...
>> >
>> > we were replacing user input fixups with hard errors asking
>> > user to fix CLI and removing custom parsers in favor of generic ones.
>> >
>> > In vmx=off case we would be fixing up what 'hv-default' explicitly set.
>> > Same applies to other hv-foo set by hv-default.
>> >
>> > ex: 'hv-default,hv-dep1=off', will turn off some dependent feature
>> > for other hv feature in hv-default set and it will error out,
>> > same goes on for enabling feature that has dependencies.
>> > Why should we treat hv-evmcs/vmx pair any different?
>> >
>>
>> VMX is not part of Hyper-V enlightenments, is it? It can also be coming
>> from a CPU model:
>>
>> "-cpu MyLovelyModelWithVmx,hv-default"
>>
>> should not throw an error!
>>
>> Again, the goal is for userspace to not know anything besides
>> 'hv-default' for Hyper-V enlightenments.
>>
>>
>> > Granted exiting with error is not the best UX, but at least it says to user
>> > what's wrong with CLI and how to fix it. Also it lets to keep QEMU code
>> > manageable and with consistent behavior.
>>
>> Enabling EVMCS only when guest CPU has VMX is a smart behavior, all
>> users want that. It is very consistent with how genuine Hyper-V behaves.
> it's all good,
> unless VMX is absent for whatever reasons (cpumodel or vmx=off on CLI),
> in this case just error out and say what's wrong instead of trying to fix
> CLI up.
It's easier, of course, but I don't think we should do that. At least
query-cpu-model-expansion type=full
model={"name":"host","props":{"hv-passthrough":true}}
should give upper layer the full list of supported features, including
evmcs, and not just throw an error. Same with 'hv-default'
>
>> >> Again, the end goal is: make it possible for upper layers to now know
>> >> anything about Hyper-V enlightenments other than 'hv-default'.
>> > I'm still doubtful about feasibility of this goal when migration is
>> > considered.
>> > It sure would work if hosts are identical hw/sw wise.
>> > In mixed setup all features, except of hv-evmcs, that included in
>> > 'hv-default',
>> > will error out in case host doesn't support it, which should prevent
>> > incompatible migration so that's also fine.
>> >
>> > But hv-evmcs will silently go away if host doesn't support it,
>> > which is issue when migration happens to/from host that supports it.
>>
>> But how can it go away? In KVM, hv-evmcs support is not conditional,
>> basically, all KVMs which support netsting state migration also support
>> EVMCS.
>
> you added that in kernel between 4.19-4.20 (8cab6507f64),
> so on Intel host flag can change depending on not so old kernel version.
>
> Unless QEMU minimum supported kernel version is 4.20,
> we can't ignore that.
> (downstreams will have to take care of it on its own)
You forget that EVMCS is not a feature in vacuum, it is an extension to
VMX. To migrate a guest which has it enabled, nested state migration has
to support it. Assuming the feature was enabled on the source, you need
at least the following:
commit 8cab6507f64eff0ccfea01fccbc7e3e05e2aaf7e
Author: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Tue Oct 16 18:50:09 2018 +0200
x86/kvm/nVMX: nested state migration for Enlightened VMCS
which is also in 4.20, I don't see how migration without it can even
succeed.
>
>> >
>> > Maybe to help mgmt to figure out hosts compatibility
>> > 1. it should know about hv-evmcs to query it's status
>> > 2. or default value set by 'hv-default' should be exposed to mgmt
>> > so it could compare whole feature-set in one go without being
>> > aware of individual features.
>> >
>> > Additionally on QEMU side for such conditional features we can
>> > theoretically add a subsection to migration stream when feature
>> > is enabled, that way we at least can prevent 'successful' migration,
>> > when destination value doesn't match. But this might already be
>> > over-engineering on my part.
>>
>> I think you're trying to solve an issue which doesn't exist. In case
>> we're successfully migrating a nested enabled guest, our KVM is modern
>> enough and supports EVMCS (on Intel, of course). Also, nested state
>> (which is part of the migration stream) has EVMCS flag, we can't migrate
>> somewhere where the flag is unsupported.
> that's what I was missing.
> Can you point out to the code that makes sure that migration fails?
> (a comment where hv-evcms is added to default set explaining why it's safe,
> pls)
>
If you look at KVM_SET_NESTED_STATE implementation in KVM (see
kvm_arch_vcpu_ioctl()), it checks for supported flags
arch/x86/kvm/x86.c- if (kvm_state.flags &
arch/x86/kvm/x86.c- ~(KVM_STATE_NESTED_RUN_PENDING |
KVM_STATE_NESTED_GUEST_MODE
arch/x86/kvm/x86.c- | KVM_STATE_NESTED_EVMCS |
KVM_STATE_NESTED_MTF_PENDING
arch/x86/kvm/x86.c- | KVM_STATE_NESTED_GIF_SET))
arch/x86/kvm/x86.c- break;
if KVM_STATE_NESTED_EVMCS is unsupported (KVM before 8cab6507f64) this
condition will fail.
In reality, regardless of evmcs, there are so many bugs in nested
migration prior to 5.11/5.10(maybe) that chances for successful
migration on these kernels are pretty slim. Worst is, the migration
won't fail, you guest (with its nested guests) will just crash. I don't
recommend anyone doing nested migration in production with some older
kernel, it's just not safe.
>
>> Anyway, I feel we're walking in circles. I'm ready to just drop all
>> EVMCS related bits from this seies to get it merged. This will make the
>> part which you hate the most ("custom parsers") go away too. We can discuss
>> EVMCS to death after that and, when we finally decide that user
>> convenience is actually worth something, we can add 'hv-evmcs' to the
>> new
>
> fine with me, it would be even easier to review if it were just a patch that
> would add 'hv-defaults', without non must have refactoring.
> (cleanup/refactoring could be another series)
>
Ok, v5 is on the list.
> If we can guarantee that hv-evcms won't flip on/off on all supported kernels,
> I'm also fine with keeping it in default set and error-ing out if vmx ends up
> in off state.
I still don't see why and how it should (or could) flip.
> However if it changes, we need to expose 'default set' to mgmt somehow,
> so it will know that hosts aren't compatible (instead of finding out it hard
> way
> in form of failed migration (assuming it fails))
The first part of this series does exactly that: exposes Hyper-V
enlightenments to upper layers in QMP. Upper layer can do e.g.
query-cpu-model-expansion type=full
model={"name":"host","props":{"hv-passthrough":true}}
and it will get all the supported features, including evmcs. This is
one of the reasons I insist this should *always* work instead of
throwing an error (with 'hv-default too).
--
Vitaly