qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs


From: Dongli Zhang
Subject: Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs
Date: Mon, 19 Jun 2023 10:25:18 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

Hi Like and zhenyu,

Thank you very much! That will be very helpful.

In order to help the review, I will rebase the patchset on top of the most
recent QEMU.

Thank you very much!

Dongli Zhang

On 6/19/23 01:52, Like Xu wrote:
> I think we've been stuck here too long. Sorry Dongli.
> 
> +zhenyu, could you get someone to follow up on this, or I will start working 
> on
> that.
> 
> On 9/1/2023 9:19 am, Dongli Zhang wrote:
>> Ping?
>>
>> About [PATCH v2 2/2], the bad thing is that the customer will not be able to
>> notice the issue, that is, the "Broken BIOS detected" in dmesg, immediately.
>>
>> As a result, the customer VM many panic randomly anytime in the future (once
>> issue is encountered) if "/proc/sys/kernel/unknown_nmi_panic" is enabled.
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>> On 12/19/22 06:45, Dongli Zhang wrote:
>>> Can I get feedback for this patchset, especially the [PATCH v2 2/2]?
>>>
>>> About the [PATCH v2 2/2], currently the issue impacts the usage of PMUs on 
>>> AMD
>>> VM, especially the below case:
>>>
>>> 1. Enable panic on nmi.
>>> 2. Use perf to monitor the performance of VM. Although without a test, I 
>>> think
>>> the nmi watchdog has the same effect.
>>> 3. A sudden system reset, or a kernel panic (kdump/kexec).
>>> 4. After reboot, there will be random unknown NMI.
>>> 5. Unfortunately, the "panic on nmi" may panic the VM randomly at any time.
>>>
>>> Thank you very much!
>>>
>>> Dongli Zhang
>>>
>>> On 12/1/22 16:22, Dongli Zhang wrote:
>>>> This patchset is to fix two svm pmu virtualization bugs, x86 only.
>>>>
>>>> version 1:
>>>> https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/
>>>>
>>>> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.
>>>>
>>>> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
>>>> virtualization. There is still below at the VM linux side ...
>>>>
>>>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>>>
>>>> ... although we expect something like below.
>>>>
>>>> [    0.596381] Performance Events: PMU not available due to virtualization,
>>>> using software events only.
>>>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>>>
>>>> The 1st patch has introduced a new x86 only accel/kvm property
>>>> "pmu-cap-disabled=true" to disable the pmu virtualization via
>>>> KVM_PMU_CAP_DISABLE.
>>>>
>>>> I considered 'KVM_X86_SET_MSR_FILTER' initially before patchset v1.
>>>> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
>>>> finally used the latter because it is easier to use.
>>>>
>>>>
>>>> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
>>>> at the KVM side may inject random unwanted/unknown NMIs to the VM.
>>>>
>>>> The svm pmu registers are not reset during QEMU system_reset.
>>>>
>>>> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
>>>> is running "perf top". The pmu registers are not disabled gracefully.
>>>>
>>>> (2). Although the x86_cpu_reset() resets many registers to zero, the
>>>> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
>>>> some pmu events are still enabled at the KVM side.
>>>>
>>>> (3). The KVM pmc_speculative_in_use() always returns true so that the 
>>>> events
>>>> will not be reclaimed. The kvm_pmc->perf_event is still active.
>>>>
>>>> (4). After the reboot, the VM kernel reports below error:
>>>>
>>>> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS
>>>> detected, complain to your hardware vendor.
>>>> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR
>>>> c0010200 is 530076)
>>>>
>>>> (5). In a worse case, the active kvm_pmc->perf_event is still able to
>>>> inject unknown NMIs randomly to the VM kernel.
>>>>
>>>> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>>>>
>>>> The 2nd patch is to fix the issue by resetting AMD pmu registers as well as
>>>> Intel registers.
>>>>
>>>>
>>>> This patchset does not cover PerfMonV2, until the below patchset is merged
>>>> into the KVM side.
>>>>
>>>> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
>>>> https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/
>>>>
>>>>
>>>> Dongli Zhang (2):
>>>>        target/i386/kvm: introduce 'pmu-cap-disabled' to set 
>>>> KVM_PMU_CAP_DISABLE
>>>>        target/i386/kvm: get and put AMD pmu registers
>>>>
>>>>   accel/kvm/kvm-all.c      |   1 +
>>>>   include/sysemu/kvm_int.h |   1 +
>>>>   qemu-options.hx          |   7 +++
>>>>   target/i386/cpu.h        |   5 ++
>>>>   target/i386/kvm/kvm.c    | 129 +++++++++++++++++++++++++++++++++++++++++-
>>>>   5 files changed, 141 insertions(+), 2 deletions(-)
>>>>
>>>> Thank you very much!
>>>>
>>>> Dongli Zhang
>>>>
>>>>
>>
>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]