qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory


From: Tao Xu
Subject: Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information
Date: Mon, 12 Aug 2019 14:11:11 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 8/12/2019 1:13 PM, Daniel Black wrote:


Tao Xu, Liu Jingqi,

Thanks for doing these updates.

On Fri,  9 Aug 2019 14:57:29 +0800
Tao <address@hidden> wrote:

From: Liu Jingqi <address@hidden>

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi <address@hidden>
Signed-off-by: Tao Xu <address@hidden>
---
  hw/acpi/hmat.h        |   3 +
  hw/core/numa.c        | 185
++++++++++++++++++++++++++++++++++++++++++ include/sysemu/numa.h |
2 + qapi/machine.json     |  95 +++++++++++++++++++++-
  qemu-options.hx       |  44 +++++++++-
  5 files changed, 326 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
index 6c32f12e78..b7c1e02cf0 100644
--- a/hw/acpi/hmat.h
+++ b/hw/acpi/hmat.h
@@ -42,6 +42,9 @@
#define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY) +#define PICO_PER_USEC 1000000
+#define PICO_PER_NSEC 1000
+
  struct HMAT_LB_Info {
      /*
       * Indicates total number of Proximity Domains
diff --git a/hw/core/numa.c b/hw/core/numa.c
index cfb6339810..9a494145f3 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -37,6 +37,7 @@
  #include "qemu/option.h"
  #include "qemu/config-file.h"
  #include "qemu/cutils.h"
+#include "hw/acpi/hmat.h"
QemuOptsList qemu_numa_opts = {
      .name = "numa",
@@ -183,6 +184,184 @@ void parse_numa_distance(MachineState *ms,
NumaDistOptions *dist, Error **errp)
ms->numa_state->have_numa_distance = true; }
+void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
+                        Error **errp)
+{
..

Optional; you could support not connected (0xffff) for latency/bandwidth in
this parsing.

+        if (*endptr == '\0') {
+            base_lat = 1;
+        } else if (*(endptr + 1) == 's') {
+            switch (*endptr) {
+            case 'p':
+                base_lat = 1;
+                break;
+            case 'n':
+                base_lat = PICO_PER_NSEC;
+                break;
+            case 'u':

Glad you picked up my mismatch of "u/micro".

+        } else {
+            error_setg(errp, "Invalid latency unit %s,"
+                "vaild units are \"ps\" \"ns\" \"us\"",
node->latency);

typo "valid"

+        } else if (hmat_lb->base_lat != base_lat) {
+            error_setg(errp, "Invalid latency unit %s,"
+                " please unify the units.", node->latency);

This error is misleading. Should be something like "all latencies must be
specified in the same units"

+        switch (toupper(*endptr)) {
+        case '\0':
+        case 'M':
+            base_bw = 1;
+            break;
+        case 'G':
+            base_bw = UINT64_C(1) << 10;
+            break;

There was one more gap - Terra.

         case 'T':
            base_bw = UINT64_C(1) << 20;
            break;

Oh, my mistake, I should use TB/s instead of PB/s.
Thank you for pointing out this.

+        case 'P':
+            base_bw = UINT64_C(1) << 20;
and:
                base_bw = UINT64_C(1) << 30;

+            break;
+        }


Currently Linux 5.3.0-rc3+ doesn't cope with real corrected "bandwidth=2P" so
maybe not worth it. >
[    2.092060] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator
Domains:1 Target Domains:2 Base:1073741824 [    2.092326]   
Initiator-Target[0-0]:-2147483648 MB/s

On values, testing for overflow is required. e.g:

  -numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=4096T
bandwidth=4096T

[    2.047676] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator 
Domains:1 Target Domains:2 Base:1048576
[    2.048084]   Initiator-Target[0-0]:0 MB/s

Technically ACPI could support up to 4P with base/offset but you'd need to be a
lot trickier (i.e. base is highest common multiple of all entries and then see
if entry/base > 2^32-2 ) with base/entry values to arrive at this number.

+docs/commit message propagation of this.


I agree. I also test the overflow case. Thank you for your suggestion. I will add a docs for it.

+        } else if (hmat_lb->base_lat != base_lat) {

Bug: Incorrectly copied - base_lat should be base_bw (twice)


My mistake, I will correct it.
+            error_setg(errp, "Invalid bandwidth unit %s,"
+                " please unify the units.", node->bandwidth);

This error is misleading. Should be something like "all bandwidths must be
specified in the same units"

diff --git a/qemu-options.hx b/qemu-options.hx
index c480781992..cda4607f3a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx

+@example
+-m 2G \
+-object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 
-numa node,nodeid=0,memdev=ram-node0 \
+-object memory-backend-ram,size=1024M,policy=bind,host-nodes=1,id=ram-node1 
-numa node,nodeid=1,memdev=ram-node1 \
+-smp 2 \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1 \
+-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns
 \
+-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M
 \
+-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns
 \
+-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
 \
+@end example

nit: remove slash on last line

Is this a valid example? I get

qemu-system-x86_64: -numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=11us:
 Invalid target=1, it hasn't a valid initiator proximity domain.

(I tested with host-nodes=1 changed to 0 as local machine is single node)


I forget to update the example. It should add as follow:

  -numa node,nodeid=0,memdev=ram-node0 \
  -numa node,nodeid=1,memdev=ram-node1,initiator=1 \

Technically on [PATCH v9 07/11]
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index abf99b1adc..431818dc82 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -67,11 +67,81 @@ static void build_hmat_mpda(GArray *table_data, uint16_t 
flags, int initiator,
      build_append_int_noprefix(table_data, 0, 8);
  }
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-142

nit: 5-146

Test as follows:

qemu-system-x86_64   -kernel /home/dan/repos/linux/vmlinux   -nographic -append 
 console=ttyS0  \
    -m 2G -object 
memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \
    -numa node,nodeid=0,memdev=ram-node0 \
    -object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node1 
\
    -numa node,nodeid=1,memdev=ram-node1 -smp 2 -numa cpu,node-id=0,socket-id=0 
\
    -numa cpu,node-id=0,socket-id=1  \
    -numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=123us
 \
    -numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M
 \
    -numa 
hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8
 \
    -numa 
hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8
 \
| grep -A 5 HMAT


[    0.038912] ACPI: HMAT 0x000000007FFE16C5 000118 (v02 BOCHS  BXPCHMAT 
00000001 BXPC 00000001)
[    0.040954] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[    0.040999] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[    0.041189] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.041250] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x3fffffff]
[    0.041276] ACPI: SRAT: Node 1 PXM 1 [mem 0x40000000-0x7fffffff]
--
[    1.984572] HMAT: Memory Flags:0001 Processor Domain:0 Memory Domain:0
[    1.984792] HMAT: Memory Flags:0000 Processor Domain:0 Memory Domain:1
[    1.985435] HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:1 
Target Domains:2 Base:1000000
[    1.986424]   Initiator-Target[0-0]:123000 nsec
[    1.986664]   Initiator-Target[0-1]:0 nsec
[    1.986910] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator 
Domains:1 Target Domains:2 Base:1
[    1.987229]   Initiator-Target[0-0]:200 MB/s
[    1.987356]   Initiator-Target[0-1]:0 MB/s
[    1.987549] HMAT: Cache: Domain:0 Size:131072 Attrs:00081111 SMBIOS Handles:0
[    1.988393] HMAT: Cache: Domain:1 Size:131072 Attrs:00081111 SMBIOS Handles:0

Leaving default latency/bw as 0 if ok as spec says '0: the corresponding latency
or bandwidth information is not provided.'. Potentially the kernel could 
display this better.

Yes, we set default as 0, because spec said. But the kernel code has no warning just show 0. But in qemu-options.hx we note this:

if NUM is 0, means the corresponding latency or bandwidth information is not provided.


Also note https://marc.info/?l=linux-acpi&m=156506549410279&w=2 submitted as
hmat_build_table_structs only calls build_hmat_mpda with flags=0 or 
HMAT_PROX_INIT_VALID (0x1) which is right looking at ACPI-6.3. An Ack/(Nack if 
I'm wrong) there would be good to have both kernel and this patch series 
working together.

Sounds good!

for entire series:

Reviewed-by: Daniel Black <address@hidden>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]