[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older c
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types |
Date: |
Wed, 29 Aug 2012 16:18:12 +0300 |
On Wed, Aug 29, 2012 at 09:56:12AM -0300, Eduardo Habkost wrote:
> On Wed, Aug 29, 2012 at 01:06:32PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 28, 2012 at 08:50:09PM -0300, Eduardo Habkost wrote:
> > > On Wed, Aug 29, 2012 at 01:25:35AM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Aug 29, 2012 at 01:21:13AM +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Aug 28, 2012 at 07:02:42PM -0300, Eduardo Habkost wrote:
> > > > > > On Wed, Aug 29, 2012 at 12:35:28AM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Aug 28, 2012 at 04:13:38PM -0300, Eduardo Habkost wrote:
> > > > > > > > On Tue, Aug 28, 2012 at 08:43:52PM +0300, Michael S. Tsirkin
> > > > > > > > wrote:
> > > > > > > > > In preparation for adding PV EOI support, disable PV EOI by
> > > > > > > > > default for
> > > > > > > > > 1.1 and older machine types, to avoid CPUID changing during
> > > > > > > > > migration.
> > > > > > > > >
> > > > > > > > > PV EOI can still be enabled/disabled by specifying it
> > > > > > > > > explicitly.
> > > > > > > > > Enable for 1.1
> > > > > > > > > -M pc-1.1 -cpu kvm64,+kvm_pv_eoi
> > > > > > > > > Disable for 1.2
> > > > > > > > > -M pc-1.2 -cpu kvm64,-kvm_pv_eoi
> > > > > > > > >
> > > > > > > >
> > > > > > > > What about users that are already running "qemu-1.1 -M pc-1.1"
> > > > > > > > on a host
> > > > > > > > kernel that supports PV EOI already? They would get PV EOI
> > > > > > > > disabled when
> > > > > > > > migrating to a destination running "qemu-1.2 -M pc-1.1".
> > > > > > > >
> > > > > > > > (On the other hand, people running "qemu-1.1 -M pc-1.1" on a
> > > > > > > > host kernel
> > > > > > > > supporting PV EOI already have migration broken, so there's not
> > > > > > > > much we
> > > > > > > > can do for them)
> > > > > > >
> > > > > > > Exactly.
> > > > > > >
> > > > > > > Talked to Gleb, long term I think we should rework code to make
> > > > > > > it forward-compatible wrt adding new MSRs:
> > > > > > > - source gets list of MSRs to be migrated from KVM and simply
> > > > > > > sends them all
> > > > > > > - send all MSRS in key/value format
> > > > > > > - destination gets list of MSRs to be migrated from KVM and
> > > > > > > only restores the supported ones
> > > > > >
> > > > > > As far as I understand the migration code
> > > > > > requirements/expectations, if
> > > > > > the origin is sending some data, it is because it is part of the
> > > > > > guest-visible machine state that must be kept while migrating.
> > > > > > Because
> > > > > > of that, the destination is not allowed to drop anything it doesn't
> > > > > > know
> > > > > > about.
> > > > >
> > > > >
> > > > > We have a ton of code that reads in values then just
> > > > > ignores them, for compat with old qemu.
> > >
> > > The difference is that you ignore only the values you _know_ to be safe
> > > to be ignored. You can't ignore a value simply because you don't know
> > > what it means, and if it's important or not.
> > > > > This will be exactly such a case:
> > > > > we don't drop anything - protocol does not
> > > > > support this. We read and simply do not tell kvm
> > > > > about it.
> > >
> > > This fits the definition of "dropping", to me.
> > >
> > > > We also have tons of code that sends
> > > > > useless values again for compatibility.
> > >
> > > But these values should be ignored only if the receiver knows exactly
> > > what it means, and knows exactly why/when it can be ignored.
> >
> > Sorry I just do not understand these meta arguments. Do you mean
> > the example below? If yes let us focus on it please.
>
> We have to have these meta arguments, because the problem is about
> features/MSRs that will be introduced in the future. But okay, let's
> follow with the specific argument:
>
> >
> > > > >
> > > > > > At the same time, if it's not part of guest-visible machine
> > > > > > state, it doesn't have to be sent by the migration origin.
> > >
> > > It may be not directly visible, but if it's part of the machine state,
> > > it affects the guest in some way. If it didn't affect the guest in any
> > > way, we wouldn't even have to send it while migrating, in the first
> > > place.
> > >
> > > > > >
> > > > >
> > > > > False, we often send internal device state which is not
> > > > > directly guest visible.
> > > >
> > > > Besides, all MSRs in KVM's MSR list are actually guest visible, even if
> > > > guests normally do not invoke these MSRs - they can.
> > >
> > > The difference is that the machine doesn't guarantee a MSR to be
> > > migrated unless the corresponding feature bit reports the MSR as
> > > supported.
> > >
> > > e.g. using the PV EOI MSR without having the PV EOI CPUID bit set is
> > > undefined behavior. The guest shouldn't do it.
> >
> > So the problematic scenario involves a guest that sets PV EOI when CPUID
> > bit is off on source. Why do we even care what happens on migration?
>
> This is the case where there is _no_ problem. But the migration
> destination can't know that, how can it decide if it's safe to drop the
> MSR value or not?
I looked and it seems always safe to drop MSRs that we have.
Any counter examples.
> >
> > > On the other hand, if the PV EOI CPUID bit is set, the MSR should be
> > > always kept, no matter what.
> > >
> > > This means that we need to migrate the MSR only if the corresponding
> > > CPUID bit is enabled.
> > >
> >
> > I do not follow. Why does it mean that?
> > It seems completely safe to migrate this MSR no matter what.
>
> It is completely safe to migrate the MSR, but it is completely unsafe to
> _ignore_ the incoming MSR value. That's the problem.
Normally CPUID will tell you if such important MSR is available.
So we can check that at destination.
> >
> > > >
> > > > > > On the other hand, a mode of operation that doesn't require updating
> > > > > > QEMU every time there's a new bit of guest-visible state to be
> > > > > > migrated
> > > > > > would be nice (just like the "-cpu host" mode, that doesn't require
> > > > > > updating QEMU for every new CPU feature, is nice for some use
> > > > > > cases). I
> > > > > > just don't know how to make work with the current migration
> > > > > > protocol.
> > > > > >
> > > > >
> > > > > I don't understand. What is the problem with the proposal?
> > > > > What will not work with our protocol?
> > > > > Can you give an example please?
> > >
> > > Yes:
> > >
> > > Suppose kernel 3.7 introduces KVM_FOO_MSR and CPUID_KVM_FOO.
> > >
> > > Also, suppose QEMU 1.2 doesn't know anything about KVM_FOO, because it
> > > was release before this feature was introduced.
> > >
> > >
> > > Then you run "qemu-1.2 -M pc-1.2" on a 3.7 host kernel. qemu-1.2 can do
> > > two things here:
> > >
> > > 1) Not enable CPUID_KVM_FOO
> > >
> > > In this case, the guest should not use KVM_FOO_MSR and the MSR does not
> > > need to be migrated (the guest may try to use it, but the behavior when
> > > trying to use it is undefined). Sending the MSR value when migrating
> > > would be useless.
> > >
> > >
> > > 2) Enable CPUID_KVM_FOO.
> > >
> > > In this case, the guest may try to use the feature and write something
> > > into KVM_FOO_MSR. Sending the MSR value when migrating is absolutely
> > > necessary
> > >
> > > ---
> > >
> > > Now assume you run "qemu-1.2 -M pc-1.2" in the destination host, running
> > > the 3.6 kernel (without KVM_FOO).
> > >
> > > Then qemu-1.2 receives the KVM_FOO_MSR data in the migration stream. In
> > > this case, qemu-1.2 simply can't decide if it's safe to drop the data
> > > (and not tell KVM about it), or not.
> > > If we simply send every MSR reported by the kernel, both the origin and
> > > destination qemu-1.2 processes can't ever know if the MSR value is
> > > important or not, because it doesn't know if it's part of the machine
> > > state that has to be kept consistent.
> > > We could introduce a mode of operation where _every_ MSR reported by KVM
> > > is important and sent by the origin (and also where every MSR must be
> > > handled by the destination, otherwise migration would fail). But this
> > > new mode would break migration compatibility between two hosts running
> > > the same machine-type, only because they are running different kernel
> > > versions. But it may be useful for some use cases, so maybe it's
> > > appropriate for a future "pc-next" machine-type (and for "-cpu host"),
> > > but not for "pc-<version>".
> > >
> >
> > In this example, we should migrate CPUID (don't we?).
> > Destination should validate that CPUID supplied by source
> > matches what it can support (doesn't it?).
> >
> > If we do and it does not, it's an unrelated bug:
> > CPUID changing under guest's feet.
>
> CPUID changing under guest's feet is another problem, that we also have
> to solve.
> But we also have the problem of migration compatibility
> between different host kernels.
So here is the solution for both: on destination pass CPUID to kvm and
it should come back unchanged. If it changed you fail migration.
>
> Note that I am not saying that migrating all MSRs is wrong. It _is_
> good, as long as:
> - The destination never ignores any of the incoming MSR values.
What I am saying for MSRs added in last 2 years it is OK to ignore
because CPUID check will tell you if it is supported
and fail migration.
> - We don't do that by default on "pc-<version>", or we defeat the
> purpose of machine-types.
>
> I'll try to enumerate the problems I am trying to address (that are
> related, but are separate problems):
>
> - MSR not being migrated when it should:
> - Possible solution: migrate all MSRs even if qemu doesn't know what
> they are.
> - Constraint: migration destination must _never_ ignore any incoming
> MSR value, because it can't decide if it is important to the guest
> or not (with the current KVM interfaces).
> - Problem with this solution: if we do that by default on
> "pc-<version>", we break migration compatibility between hosts
> with different kernel versions.
Solution: add vcpu ioctl that tells you which MSRs to migrate
(on source), depending on CPUID.
> - Changing CPUID bits under guest's feet.
> - Proposed solution: migrating CPUID bits, refuse migration if
> destination doesn't support the same bits.
> - It solves the compatibility problem for migration to a newer
> kernel, but not to an older kernel. It helps to solve part of
> the problem, but not all.
How does not it save all of the problem? If destination kernel
can support cpuid, then we are fine - it is new enough.
> - Alternative solution: simply make the resulting CPUID bits not be a
> function of the host kernel capabilities, but only of the qemu
> configuration (except on "-cpu host" and "-M pc-next").
This perpetuates existing duplication of code between
kvm and qemu. We are better off with logic in 1 place.
> - Migration compatibility/predictability:
> - See my example above: feature introduced in a newer kernel,
> migration to an older kernel.
If it is enabled then migration fails.
> - The only way I see to guarantee this is to never enable unknown
> CPUID bits or MSRs. People who don't care about predictable
> migration compatibility can use "-M pc-next", then.
>
Guarantee what?
Just check dst can support all msrs and cpuid bits of src.
Way to check is to ask kvm :) Not to add logic in qemu.
> >
> > > > >
> > > > >
> > > > > > > Too late for 1.2?
> > > > > >
> > > > > > Absolutely (in my opinion).
> > > > > >
> > > > > > >
> > > > > > > > While we don't make the KVM feature-bit handling sane (with
> > > > > > > > defaults
> > > > > > > > that are not blindly derived from the host kernel
> > > > > > > > capabilities), maybe
> > > > > > > > the safest bet is to expect users to not migrate between hosts
> > > > > > > > running
> > > > > > > > kernels with different KVM capabilities? (I am not sure which
> > > > > > > > option is
> > > > > > > > better)
> > > > > > >
> > > > > > > Sorry not sure what you talk about here. What has KVM feature-bit
> > > > > > > handling to do with this patchset?
> > > > > >
> > > > > > Everything? The whole point of this patch is to filter out the
> > > > > > PV_EOI
> > > > > > KVM feature bit.
> > > > > >
> > > > > >
> > > > > > This part of the current code, specifically, is wrong:
> > > > > >
> > > > > > > > > plus_kvm_features = ~0; /* not supported bits will be
> > > > > > > > > filtered out later */
> > > > > >
> > > > > > The QEMU-side list of KVM features should be whitelist-based, not
> > > > > > blacklist-based (unless the user doesn't need migration, in that
> > > > > > case he
> > > > > > can use "-cpu host" and get every feature blindly enabled), because
> > > > > > QEMU
> > > > > > can't know if a new feature involves guest-visible state that has
> > > > > > to be
> > > > > > migrated.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > Signed-off-by: Michael S. Tsirkin <address@hidden>
> > > > > > > > > ---
> > > > > > > > > hw/Makefile.objs | 2 +-
> > > > > > > > > hw/cpu_flags.c | 32 ++++++++++++++++++++++++++++++++
> > > > > > > > > hw/cpu_flags.h | 9 +++++++++
> > > > > > > > > hw/pc_piix.c | 2 ++
> > > > > > > > > target-i386/cpu.c | 8 ++++++++
> > > > > > > > > 5 files changed, 52 insertions(+), 1 deletion(-)
> > > > > > > > > create mode 100644 hw/cpu_flags.c
> > > > > > > > > create mode 100644 hw/cpu_flags.h
> > > > > > > > >
> > > > > > > > > diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> > > > > > > > > index 850b87b..3f2532a 100644
> > > > > > > > > --- a/hw/Makefile.objs
> > > > > > > > > +++ b/hw/Makefile.objs
> > > > > > > > > @@ -1,5 +1,5 @@
> > > > > > > > > hw-obj-y = usb/ ide/
> > > > > > > > > -hw-obj-y += loader.o
> > > > > > > > > +hw-obj-y += loader.o cpu_flags.o
> > > > > > > > > hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
> > > > > > > > > hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
> > > > > > > > > hw-obj-y += fw_cfg.o
> > > > > > > > > diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c
> > > > > > > > > new file mode 100644
> > > > > > > > > index 0000000..7a633c0
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/hw/cpu_flags.c
> > > > > > > > > @@ -0,0 +1,32 @@
> > > > > > > > > +/*
> > > > > > > > > + * CPU compatibility flags.
> > > > > > > > > + *
> > > > > > > > > + * Copyright (c) 2012 Red Hat Inc.
> > > > > > > > > + * Author: Michael S. Tsirkin.
> > > > > > > > > + *
> > > > > > > > > + * This program is free software; you can redistribute it
> > > > > > > > > and/or modify
> > > > > > > > > + * it under the terms of the GNU General Public License as
> > > > > > > > > published by
> > > > > > > > > + * the Free Software Foundation; either version 2 of the
> > > > > > > > > License, or
> > > > > > > > > + * (at your option) any later version.
> > > > > > > > > + *
> > > > > > > > > + * This program is distributed in the hope that it will be
> > > > > > > > > useful,
> > > > > > > > > + * but WITHOUT ANY WARRANTY; without even the implied
> > > > > > > > > warranty of
> > > > > > > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
> > > > > > > > > the
> > > > > > > > > + * GNU General Public License for more details.
> > > > > > > > > + *
> > > > > > > > > + * You should have received a copy of the GNU General Public
> > > > > > > > > License along
> > > > > > > > > + * with this program; if not, see
> > > > > > > > > <http://www.gnu.org/licenses/>.
> > > > > > > > > + */
> > > > > > > > > +#include "hw/cpu_flags.h"
> > > > > > > > > +
> > > > > > > > > +static bool kvm_pv_eoi_disabled_state;
> > > > > > > > > +
> > > > > > > > > +void disable_kvm_pv_eoi(void)
> > > > > > > > > +{
> > > > > > > > > + kvm_pv_eoi_disabled_state = true;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +bool kvm_pv_eoi_disabled(void)
> > > > > > > > > +{
> > > > > > > > > + return kvm_pv_eoi_disabled_state;
> > > > > > > > > +}
> > > > > > > > > diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h
> > > > > > > > > new file mode 100644
> > > > > > > > > index 0000000..05777b6
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/hw/cpu_flags.h
> > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > +#ifndef HW_CPU_FLAGS_H
> > > > > > > > > +#define HW_CPU_FLAGS_H
> > > > > > > > > +
> > > > > > > > > +#include <stdbool.h>
> > > > > > > > > +
> > > > > > > > > +void disable_kvm_pv_eoi(void);
> > > > > > > > > +bool kvm_pv_eoi_disabled(void);
> > > > > > > > > +
> > > > > > > > > +#endif
> > > > > > > > > diff --git a/hw/pc_piix.c b/hw/pc_piix.c
> > > > > > > > > index 008d42f..bdbceda 100644
> > > > > > > > > --- a/hw/pc_piix.c
> > > > > > > > > +++ b/hw/pc_piix.c
> > > > > > > > > @@ -46,6 +46,7 @@
> > > > > > > > > #ifdef CONFIG_XEN
> > > > > > > > > # include <xen/hvm/hvm_info_table.h>
> > > > > > > > > #endif
> > > > > > > > > +#include "cpu_flags.h"
> > > > > > > > >
> > > > > > > > > #define MAX_IDE_BUS 2
> > > > > > > > >
> > > > > > > > > @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = {
> > > > > > > > >
> > > > > > > > > static void pc_machine_v1_1_compat(void)
> > > > > > > > > {
> > > > > > > > > + disable_kvm_pv_eoi();
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > static void pc_init_pci_v1_1(ram_addr_t ram_size,
> > > > > > > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > > > > > > > > index 120a2e3..0d02fd1 100644
> > > > > > > > > --- a/target-i386/cpu.c
> > > > > > > > > +++ b/target-i386/cpu.c
> > > > > > > > > @@ -23,6 +23,7 @@
> > > > > > > > >
> > > > > > > > > #include "cpu.h"
> > > > > > > > > #include "kvm.h"
> > > > > > > > > +#include "asm/kvm_para.h"
> > > > > > > > >
> > > > > > > > > #include "qemu-option.h"
> > > > > > > > > #include "qemu-config.h"
> > > > > > > > > @@ -33,6 +34,7 @@
> > > > > > > > > #include "hyperv.h"
> > > > > > > > >
> > > > > > > > > #include "hw/hw.h"
> > > > > > > > > +#include "hw/cpu_flags.h"
> > > > > > > > >
> > > > > > > > > /* feature flags taken from "Intel Processor Identification
> > > > > > > > > and the CPUID
> > > > > > > > > * Instruction" and AMD's "CPUID Specification". In cases
> > > > > > > > > of disagreement
> > > > > > > > > @@ -889,6 +891,12 @@ static int
> > > > > > > > > cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char
> > > > > > > > > *cpu_model)
> > > > > > > > >
> > > > > > > > > plus_kvm_features = ~0; /* not supported bits will be
> > > > > > > > > filtered out later */
> > > > > > > > >
> > > > > > > > > + /* Disable PV EOI for old machine types.
> > > > > > > > > + * Feature flags can still override. */
> > > > > > > > > + if (kvm_pv_eoi_disabled()) {
> > > > > > > > > + plus_kvm_features &= ~(0x1 << KVM_FEATURE_PV_EOI);
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > add_flagname_to_bitmaps("hypervisor", &plus_features,
> > > > > > > > > &plus_ext_features, &plus_ext2_features,
> > > > > > > > > &plus_ext3_features,
> > > > > > > > > &plus_kvm_features, &plus_svm_features);
> > > > > > > > > --
> > > > > > > > > MST
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Eduardo
> > > > > >
> > > > > > --
> > > > > > Eduardo
> > >
> > > --
> > > Eduardo
>
> --
> Eduardo
- [Qemu-devel] [PATCHv4 1/4] linux-headers: update to 3.6-rc3, (continued)
- [Qemu-devel] [PATCHv4 1/4] linux-headers: update to 3.6-rc3, Michael S. Tsirkin, 2012/08/28
- [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Michael S. Tsirkin, 2012/08/28
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Eduardo Habkost, 2012/08/28
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Michael S. Tsirkin, 2012/08/28
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Eduardo Habkost, 2012/08/28
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Michael S. Tsirkin, 2012/08/28
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Michael S. Tsirkin, 2012/08/28
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Eduardo Habkost, 2012/08/28
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Michael S. Tsirkin, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Eduardo Habkost, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types,
Michael S. Tsirkin <=
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Eduardo Habkost, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Michael S. Tsirkin, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Eduardo Habkost, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Marcelo Tosatti, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Marcelo Tosatti, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Michael S. Tsirkin, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Michael S. Tsirkin, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Michael S. Tsirkin, 2012/08/29
- Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Juan Quintela, 2012/08/29
Re: [Qemu-devel] [PATCHv4 3/4] cpuid: disable pv eoi for 1.1 and older compat types, Anthony Liguori, 2012/08/29