qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations


From: Pierrick Bouvier
Subject: Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations
Date: Tue, 16 Jan 2024 11:46:39 +0400
User-agent: Mozilla Thunderbird

On 1/15/24 13:04, Alex Bennée wrote:> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:>
On 1/13/24 21:16, Alex Bennée wrote:
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

On 1/12/24 21:20, Alex Bennée wrote:
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
Hi Pierrick,
On 11/1/24 15:23, Pierrick Bouvier wrote:
For now, it simply performs instruction, bb and mem count, and ensure
that inline vs callback versions have the same result. Later, we'll
extend it when new inline operations are added.

Use existing plugins to test everything works is a bit cumbersome, as
different events are treated in different plugins. Thus, this new one.

<snip>
+#define MAX_CPUS 8
Where does this value come from?


The plugin tests/plugin/insn.c had this constant, so I picked it up
from here.

Should the pluggin API provide a helper to ask TCG how many
vCPUs are created?

In user mode, we can't know how many simultaneous threads (and thus
vcpu) will be triggered by advance. I'm not sure if additional cpus
can be added in system mode.

One problem though, is that when you register an inline op with a
dynamic array, when you resize it (when detecting a new vcpu), you
can't change it afterwards. So, you need a storage statically sized
somewhere.

Your question is good, and maybe we should define a MAX constant that
plugins should rely on, instead of a random amount.
For user-mode it can be infinite. The existing plugins do this by
ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
scoreboard as well? Of course that does introduce a trap for those using
user-mode...


The problem with vcpu-index % max_vcpu is that it reintroduces race
condition, though it's probably less frequent than on a single
variable. IMHO, yes it solves memory error, but does not solve the
initial problem itself.

The simplest solution would be to have a size "big enough" for most
cases, and abort when it's reached.
Well that is simple enough for system emulation as max_vcpus is a
bounded
number.

Another solution, much more complicated, but correct, would be to move
memory management of plugin scoreboard to plugin runtime, and add a
level of indirection to access it.
That certainly gives us the most control and safety. We can then
ensure
we'll never to writing past the bounds of the buffer. The plugin would
have to use an access function to get the pointer to read at the time it
cared and of course inline checks should be pretty simple.

Every time a new vcpu is added, we
can grow dynamically. This way, the array can grow, and ultimately,
plugin can poke its content/size. I'm not sure this complexity is what
we want though.
It doesn't seem too bad. We have a start/end_exclusive is *-user
do_fork
where we could update pointers. If we are smart about growing the size
of the arrays we could avoid too much re-translation.


I was concerned about a potential race when the scoreboard updates
this pointer, and other cpus are executing tb (using it). But this
concern is not valid, since start_exclusive ensures all other cpus are
stopped.

vcpu_init_hook function in plugins/core.c seems a good location to add
this logic. We would check if an update is needed, then
start_exclusive(), update the scoreboard and exit exclusive section.

It might already be in the exclusive section. We should try and re-use
an existing exclusive region rather than adding a new one. It's
expensive so best avoided if we can.

Do you think it's worth to try to inline scoreboard pointer (and flush
all tb when updated), instead of simply adding an indirection to it?
With this, we could avoid any need to re-translate anything.

Depends on the cost/complexity of the indirection I guess.
Re-translation isn't super expensive if we say double the size of the
score board each time we need to.

Do we want a limit of one scoreboard per thread? Can we store structures
in there?


 From the current plugins use case, it seems that several scoreboards
are needed.
Allowing structure storage seems a bit more tricky to me, because
since memory may be reallocated, users won't be allowed to point
directly to it. I would be in favor to avoid this (comments are
welcome).

The init function can take a sizeof(entry) field and the inline op can
have an offset field (which for the simple 0 case can be folded away by
TCG).


Thanks for all your comments and guidance on this.

I implemented a new version, working with a scoreboard that gets resized automatically, which allows usage of structs as well. The result is pretty satisfying as there is almost no more need to manually keep track of how many cpus have been used, while offering thread-safety by default.

I'll re-spin the serie once I cleaned up commits, and ported existing plugins to this.
reply via email to

[Prev in Thread] Current Thread [Next in Thread]