[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v6 0/3] Add support for the RAPL MSRs series
From: |
Christian Horn |
Subject: |
Re: [PATCH v6 0/3] Add support for the RAPL MSRs series |
Date: |
Wed, 6 Nov 2024 12:14:14 +0900 |
* Igor Mammedov さんが書きました:
> On Tue, 5 Nov 2024 08:11:14 +0100
> Christian Horn <chorn@fluxcoil.net> wrote:
>
> > - For reading the metrics in the guest, I was tempted to suggest PCP with
> > pmda-denki to cover RAPL, but it's right now just reading /sysfs, not
> > MSR's. pmda-lmsensors for further sensors offered on various systems,
> For NVF usecase, I also was eyeing pmda-denki.
>
> How hard it would be to add MSR based sampling to denki?
> Can we borrow Anthony's MSR sampling from
> qemu-vmsr-helper, to reduce amount of work needed.
Should be possible. Also for /sysfs we do a detection of domains, and
based on that register metrics and instances with pmcd. For rapl-msr,
that could be done in a similiar way, i.e. as denki.rapl-msr,
or separating into denki.rapl.sysfs and denki.rapl.msr .
As for the actual doing, I'm not part of the engineering org but
support, so it's a spare time activity, when I get to it. PCP
engineering has people on the project, a Jira would be a first step.
Direct pull requests to upstream are also a good start of course. When
developing that, one would in cycles modify src/pmdas/denki/denki.c,
compile it, get pmcd to use the modified pmda-denki, look at debug
output and metrics.
> Also, for guest per vCPU accounting, we would need per thread
> accounting (which I haven't noticed from a quick look at denki).
> So some effort would be needed to add it there.
I think we have these metrics in pmcd already from pmda-linux, i.e. we
can see them with this:
# pmrep -1gU -t 5 -J 3 proc.hog.cpu [..]
[ 1] - proc.hog.cpu["083377 /usr/lib64/firefox/firefox"]
[ 2] - proc.hog.cpu["084634 /usr/lib64/firefox/firefox"]
[ 3] - proc.hog.cpu["085225 md5sum"]
1 2 3
0.001 0.003 16.304
=> Top 3 consumers, process 3 is heaviest.
This uses derived metrics, computes from others, defined here:
$ cat /etc/pcp/derived/proc.conf
[..]
proc.hog.cpu = 100 * (rate(proc.psinfo.utime) + rate(proc.psinfo.stime)) /
(kernel.all.uptime - proc.psinfo.start_time)
proc.hog.cpu(oneline) = average percentage CPU utilization of each process
[..]
I was brainstorming with Nathan about this in the past, but we did
not quickly get to something and lost track.
Following the PCP approach, a client would query the required metrics
from pmcd (i.e. "process md5sum is right now using most cpu cycles"),
and together with "the overall VM or bare-metal-system consumes right
now 100W", one could attribute. We might get away with derived
metrics as per above. If the computation is not doable with that, we
might also use own client code (i.e. C, or python) which gets the
metrics and computes the accounting per thread.
Last resort would be to collect the required process metrics in
pmda-denki for computation there.
We might want to take this one out and discuss on PCP upstream,
i.e. pcp@groups.io .
> I didn't know about pmda-lmsensors, I guess we should be able to use
> it out of box with 'acpi power meter' sensor, if QEMU were to provide such.
> I've also seen denki supporting battery power sensor, we can abuse that
> and make QEMU provide that, but I'd rather add 'acpi power meter' sensor
> to denki (which to some degree intersects with battery power sensor
> functionality).
On this aarch64/Asahi macbook here, recent kernels made
/sys/class/hwmon/hwmon1 available, and 'sensors' offers:
[chris@asahi sensors]$ sensors
[..]
Total System Power: 7.71 W
AC Input Power: 9.99 W
3.8 V Rail Power: 0.00 W
Heatpipe Power: 2.46 W
[..]
I'm still wondering how these fit into a picture like this one:
https://htmlpreview.github.io/?https://github.com/christianhorn/smallhelpers/blob/main/pmda-denki-handbook/denki.html#_hardware_requirements_new_version
So with these also overall system consumption is available while AC
powered - of course, just that hardware right now.
> PS:
> In this series Anthony uses custom protocol to get data from
> privileged MSR helper to QEMU. Would it be acceptable?
The only request would be that implementing that is "an optional ontop
source", so not preventing MSR access from bare metal hosts not having
it. I guess that's given. So then it's an abstracted channel we
provide into the guest.
> Or is there a preferred way for PCP to do inter-process comms?
Hm.. I thought this was here used to communicate between host and guest?
On the good side, if we get the per-thread-attribution done, we can
illustrate attribution up into guests with what mermaid calls sankey:
https://mermaid.js.org/syntax/sankey.html :)
cheers,
--
Christian Horn
AMC Technical Account Manager, Red Hat K.K.
pgp fprint ADA6 C79C AF2E 973E 3F70 73C5 9373 49E7 347B 904F