Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT

From:	Xiaoyao Li
Subject:	Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT
Date:	Sun, 17 Oct 2021 18:37:08 +0800
User-agent:	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 Thunderbird/91.2.0

On 10/16/2021 4:22 AM, Eduardo Habkost wrote:

On Thu, Sep 09, 2021 at 10:41:48PM +0800, Xiaoyao Li wrote:

commit e37a5c7fa459 ("i386: Add Intel Processor Trace feature support")
added the support of Intel PT by making CPUID[14] of PT as fixed feature
set (from ICX) for any CPU model on any host.

This truly breaks the PT exposing on Intel SPR platform because SPR has
less supported bitmap CPUID(0x14,1):EBX[15:0] than ICX.

This patch enables passing through host's PT capabilities for the case
"-cpu host/max" while ensure named CPU model still has the fixed
PT feature set to not break the live migration.

It introduces @has_specific_intel_pt_feature_set field for name CPU
model. If a named CPU model has this field as false, it will use fixed
PT feature set of ICX. Besides same to previous behavior, if fixed PT
feature set of ICX cannot be satisfied/supported by host, it disables PT
instead of adjusting the feature set based on host's capabilities.

In the future, new named CPU model, e.g., Sapphire Rapids, can define
its own PT feature set by setting @has_specific_intel_pt_feature_set to
true and defines it's own FEAT_14_0_EBX, FEAT_14_0_ECX, FEAT_14_1_EAX
and FEAT_14_1_EBX.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
  target/i386/cpu.c | 106 ++++++++++++++++++++--------------------------
  target/i386/cpu.h |   1 +
  2 files changed, 47 insertions(+), 60 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 58e98210f219..00c4ad23110d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -543,34 +543,24 @@ static CPUCacheInfo legacy_l3_cache = {
  #define L2_ITLB_4K_ASSOC       4
  #define L2_ITLB_4K_ENTRIES   512

-/* CPUID Leaf 0x14 constants: */

-#define INTEL_PT_MAX_SUBLEAF     0x1
-/*
- * bit[00]: IA32_RTIT_CTL.CR3 filter can be set to 1 and IA32_RTIT_CR3_MATCH
- *          MSR can be accessed;
- * bit[01]: Support Configurable PSB and Cycle-Accurate Mode;
- * bit[02]: Support IP Filtering, TraceStop filtering, and preservation
- *          of Intel PT MSRs across warm reset;
- * bit[03]: Support MTC timing packet and suppression of COFI-based packets;
- */
-#define INTEL_PT_MINIMAL_EBX     0xf
-/*
- * bit[00]: Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1 and
- *          IA32_RTIT_OUTPUT_BASE and IA32_RTIT_OUTPUT_MASK_PTRS MSRs can be
- *          accessed;
- * bit[01]: ToPA tables can hold any number of output entries, up to the
- *          maximum allowed by the MaskOrTableOffset field of
- *          IA32_RTIT_OUTPUT_MASK_PTRS;
- * bit[02]: Support Single-Range Output scheme;
- */
-#define INTEL_PT_MINIMAL_ECX     0x7
-/* generated packets which contain IP payloads have LIP values */
-#define INTEL_PT_IP_LIP          (1 << 31)
-#define INTEL_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable address ranges 
*/
-#define INTEL_PT_ADDR_RANGES_NUM_MASK 0x7
-#define INTEL_PT_MTC_BITMAP      (0x0249 << 16) /* Support ART(0,3,6,9) */
-#define INTEL_PT_CYCLE_BITMAP    0x1fff         /* Support 0,2^(0~11) */
-#define INTEL_PT_PSB_BITMAP      (0x003f << 16) /* Support 
2K,4K,8K,16K,32K,64K */
+#define INTEL_PT_MAX_SUBLEAF                0x1
+#define INTEL_PT_DEFAULT_ADDR_RANGES_NUM    0x2
+#define INTEL_PT_ADDR_RANGES_NUM_MASK       0x7
+/* Support ART(0,3,6,9) */
+#define INTEL_PT_DEFAULT_MTC_BITMAP         0x0249
+/* Support 0,2^(0~11) */
+#define INTEL_PT_DEFAULT_CYCLE_BITMAP       0x1fff
+/* Support 2K,4K,8K,16K,32K,64K */
+#define INTEL_PT_DEFAULT_PSB_BITMAP         0x003f
+
+#define INTEL_PT_DEFAULT_0_EBX  (CPUID_14_0_EBX_CR3_FILTER | \
+            CPUID_14_0_EBX_PSB | CPUID_14_0_EBX_IP_FILTER | CPUID_14_0_EBX_MTC)
+#define INTEL_PT_DEFAULT_0_ECX  (CPUID_14_0_ECX_TOPA | \
+            CPUID_14_0_ECX_MULTI_ENTRIES | CPUID_14_0_ECX_SINGLE_RANGE)
+#define INTEL_PT_DEFAULT_1_EAX  (INTEL_PT_DEFAULT_MTC_BITMAP << 16 | \
+                                 INTEL_PT_DEFAULT_ADDR_RANGES_NUM)
+#define INTEL_PT_DEFAULT_1_EBX  (INTEL_PT_DEFAULT_PSB_BITMAP << 16 | \
+                                 INTEL_PT_DEFAULT_CYCLE_BITMAP)


I like these new macros because they make the code at
x86_cpu_filter_features() clearer.  Can we do this in a separate
patch, to be applied before "Introduce FeatureWordInfo for Intel
PT CPUID leaf 0xD"?

void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,

                                uint32_t vendor2, uint32_t vendor3)
@@ -1517,6 +1507,7 @@ typedef struct X86CPUDefinition {
      int family;
      int model;
      int stepping;
+    bool has_specific_intel_pt_feature_set;
      FeatureWordArray features;
      const char *model_id;
      const CPUCaches *const cache_info;
@@ -5061,6 +5052,14 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel 
*model)
          env->features[w] = def->features[w];
      }

+ if (!def->has_specific_intel_pt_feature_set) {

+        env->use_default_intel_pt = true;
+        env->features[FEAT_14_0_EBX] = INTEL_PT_DEFAULT_0_EBX;
+        env->features[FEAT_14_0_ECX] = INTEL_PT_DEFAULT_0_ECX;
+        env->features[FEAT_14_1_EAX] = INTEL_PT_DEFAULT_1_EAX;
+        env->features[FEAT_14_1_EBX] = INTEL_PT_DEFAULT_1_EBX;
+    }


There's nothing special about intel-pt, and other features might
benefit from having default values set if omitted from the CPU
model definition.  I don't know yet what's the best solution
here, but some possibilities are:

A) The simpler and more obvious solution: just set
   features[FEAT_14_*] explicitly to INTEL_PT_DEFAULT_* on all CPU
   models in builtin_x86_defs.

B) Treat (env->features[FEAT_14_0_EBX] == 0 &&
           env->features[FEAT_14_0_ECX] == 0 &&
           env->features[FEAT_14_1_EAX] == 0 &&
           env->features[FEAT_14_1_EBX] == 0) as a special case
   that indicates that INTEL_PT_DEFAULT_* should be used instead
   of 0.

C) Rework X86CPUDefinition completely and replace
     FeatureWordArray features;
   with something like:
     struct {
       FeatureWord w;
       uint32_t value;
     } *features;
   or:
     struct {
       const char *property, *value;
     } *features;
   so we can have a concept of omitted/default flags in
   builtin_x86_defs.

(A) and (C) are generic but require a lot more work, so we could
keep your solution or just implement (B).


yew, I can implement B. Maybe for B), we can just use

        (env->features[FEAT_14_0_EBX] == 0)

for simplicity?

In either case, I suggest you add a comment explaining why the
boolean flag or special case exists (I believe the answer is:
"because all CPU models have the same default value for
INTEL_PT_* and we don't want to manually copy/paste it to all
entries in builtin_x86_defs").


yes, I can add the comment.

(It's just because we want to keep compatible with old QEMU, which has anot-so-good design of INTEL_PT feature set)

+
      /* legacy-cache defaults to 'off' if CPU model provides cache info */
      cpu->legacy_cache = !def->cache_info;

@@ -5465,14 +5464,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,if (count == 0) {

              *eax = INTEL_PT_MAX_SUBLEAF;
-            *ebx = INTEL_PT_MINIMAL_EBX;
-            *ecx = INTEL_PT_MINIMAL_ECX;
-            if (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP) {
-                *ecx |= CPUID_14_0_ECX_LIP;
-            }
+            *ebx = env->features[FEAT_14_0_EBX];
+            *ecx = env->features[FEAT_14_0_ECX];
          } else if (count == 1) {
-            *eax = INTEL_PT_MTC_BITMAP | INTEL_PT_ADDR_RANGES_NUM;
-            *ebx = INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP;
+            *eax = env->features[FEAT_14_1_EAX];
+            *ebx = env->features[FEAT_14_1_EBX];
          }
          break;
      }
@@ -6081,6 +6077,7 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool 
verbose)
      CPUX86State *env = &cpu->env;
      FeatureWord w;
      const char *prefix = NULL;
+    uint64_t host_feat;

if (verbose) {

          prefix = accel_uses_host_cpuid()
@@ -6089,8 +6086,7 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool 
verbose)
      }

for (w = 0; w < FEATURE_WORDS; w++) {

-        uint64_t host_feat =
-            x86_cpu_get_supported_feature_word(w, false);
+        host_feat = x86_cpu_get_supported_feature_word(w, false);


This doesn't look right.  The value of host_feat set here is
useless outside this for loop, and there's no reason to change
the scope of this variable.


I just don't want to define host_feat twice in different sub-block.

I don't think there is any problem I move the define of host_feat out ofthe loop. host_feat is just the variable to store the value when queryhost/kvm supported value for specific feature word.


In the later code, I need to query the value for FEAT_14_0_ECX again.

          uint64_t requested_features = env->features[w];
          uint64_t unavailable_features;
          if (w == FEAT_14_1_EAX) {
@@ -6107,32 +6103,22 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool 
verbose)
          mark_unavailable_features(cpu, w, unavailable_features, prefix);
      }

- if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&

-        kvm_enabled()) {
-        KVMState *s = CPU(cpu)->kvm_state;
-        uint32_t eax_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EAX);
-        uint32_t ebx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EBX);
-        uint32_t ecx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_ECX);
-        uint32_t eax_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EAX);
-        uint32_t ebx_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EBX);
+    if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
+        host_feat = x86_cpu_get_supported_feature_word(FEAT_14_0_ECX, false);


This doesn't look right.  You seem to be using the same variable
(host_feat) for two completely different purposes.


on the former, it's

        host_feat = x86_cpu_get_supported_feature_word(w, false);

here, it's

        host_feat = x86_cpu_get_supported_feature_word(FEAT_14_0_ECX, false);

why different?

- if (!eax_0 ||

-           ((ebx_0 & INTEL_PT_MINIMAL_EBX) != INTEL_PT_MINIMAL_EBX) ||
-           ((ecx_0 & INTEL_PT_MINIMAL_ECX) != INTEL_PT_MINIMAL_ECX) ||
-           ((eax_1 & INTEL_PT_MTC_BITMAP) != INTEL_PT_MTC_BITMAP) ||
-           ((eax_1 & INTEL_PT_ADDR_RANGES_NUM_MASK) <
-                                           INTEL_PT_ADDR_RANGES_NUM) ||
-           ((ebx_1 & (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) !=
-                (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) ||
-           ((ecx_0 & CPUID_14_0_ECX_LIP) !=
-                (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP))) {
-            /*
-             * Processor Trace capabilities aren't configurable, so if the
-             * host can't emulate the capabilities we report on
-             * cpu_x86_cpuid(), intel-pt can't be enabled on the current host.
-             */
+        if (env->use_default_intel_pt &&
+            ((env->features[FEAT_14_0_EBX] != INTEL_PT_DEFAULT_0_EBX) ||
+             ((env->features[FEAT_14_0_ECX] & ~CPUID_14_0_ECX_LIP) !=
+              INTEL_PT_DEFAULT_0_ECX) ||
+             (env->features[FEAT_14_1_EAX] != INTEL_PT_DEFAULT_1_EAX) ||
+             (env->features[FEAT_14_1_EBX] != INTEL_PT_DEFAULT_1_EBX))) {
              mark_unavailable_features(cpu, FEAT_7_0_EBX, 
CPUID_7_0_EBX_INTEL_PT, prefix);


Why is use_default_intel_pt necessary?  Why can't this function
simply validate whatever is inside env->features[FEAT_14_*]?

Because this handling is only to keep it consistent with old QEMU. youcan see that it validate against INTEL_PT_DEFAULT_* , which is the PTset of ICX. If remove use_default_intel_pt check, it will also apply thecheck on other CPU models (e.g., in patch 4) that have differentINTEL_PT set.

          }
+
+        if ((env->features[FEAT_14_0_ECX] ^ host_feat) & CPUID_14_0_ECX_LIP) {


What if CPUID_7_0_EBX_INTEL_PT is not set?  What should be the
value of host_feat?


Above all handling is executed under the condition of

        if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {

+            warn_report("Cannot configure different Intel PT IP payload format than 
hardware");
+            mark_unavailable_features(cpu, FEAT_7_0_EBX, 
CPUID_7_0_EBX_INTEL_PT, NULL);
+        }
      }
  }

diff --git a/target/i386/cpu.h b/target/i386/cpu.h

index f5478a169f9a..e6236c371c4f 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1639,6 +1639,7 @@ typedef struct CPUX86State {
      uint32_t cpuid_vendor2;
      uint32_t cpuid_vendor3;
      uint32_t cpuid_version;
+    bool use_default_intel_pt;
      FeatureWordArray features;
      /* Features that were explicitly enabled/disabled */
      FeatureWordArray user_features;
--
2.27.0

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT, Eduardo Habkost, 2021/10/15
- Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT, Xiaoyao Li <=
- Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT, Xiaoyao Li, 2021/10/17
  - Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT, Xiaoyao Li, 2021/10/18
    - Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT, Eduardo Habkost, 2021/10/20

Prev by Date: Re: [PATCH 6/8] q800: route SONIC on-board Ethernet IRQ via nubus IRQ 9 in classic mode
Next by Date: TCP/IP connections sometimes stop retransmitting packets (in nested virtualization case)
Previous by thread: Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT
Next by thread: Re: [RFC PATCH 3/5] target/i386: Enable host pass through of Intel PT
Index(es):
- Date
- Thread