qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/8] hvf: Move common code out


From: Alexander Graf
Subject: Re: [PATCH 2/8] hvf: Move common code out
Date: Thu, 3 Dec 2020 23:13:35 +0100
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:84.0) Gecko/20100101 Thunderbird/84.0


On 03.12.20 19:42, Peter Collingbourne wrote:
On Thu, Dec 3, 2020 at 1:41 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote:
On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote:
On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote:

On 01.12.20 00:01, Peter Collingbourne wrote:
On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote:
Hi Peter,

On 30.11.20 22:08, Peter Collingbourne wrote:
On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> wrote:
Hi Frank,

Thanks for the update :). Your previous email nudged me into the right 
direction. I previously had implemented WFI through the internal timer 
framework which performed way worse.
Cool, glad it's helping. Also, Peter found out that the main thing keeping us from 
just using cntpct_el0 on the host directly and compare with cval is that if we 
sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get 
either the architecture or macos to read out the sleep time then we might be able 
to not have to use a poll interval either!
Along the way, I stumbled over a few issues though. For starters, the signal 
mask for SIG_IPI was not set correctly, so while pselect() would exit, the 
signal would never get delivered to the thread! For a fix, check out

     
https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/

Thanks, we'll take a look :)

Please also have a look at my latest stab at WFI emulation. It doesn't handle 
WFE (that's only relevant in overcommitted scenarios). But it does handle WFI 
and even does something similar to hlt polling, albeit not with an adaptive 
threshold.
Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
I'll reply to your patch here. You have:

+                    /* Set cpu->hvf->sleeping so that we get a
SIG_IPI signal. */
+                    cpu->hvf->sleeping = true;
+                    smp_mb();
+
+                    /* Bail out if we received an IRQ meanwhile */
+                    if (cpu->thread_kicked || (cpu->interrupt_request &
+                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
+                        cpu->hvf->sleeping = false;
+                        break;
+                    }
+
+                    /* nanosleep returns on signal, so we wake up on kick. */
+                    nanosleep(ts, NULL);

and then send the signal conditional on whether sleeping is true, but
I think this is racy. If the signal is sent after sleeping is set to
true but before entering nanosleep then I think it will be ignored and
we will miss the wakeup. That's why in my implementation I block IPI
on the CPU thread at startup and then use pselect to atomically
unblock and begin sleeping. The signal is sent unconditionally so
there's no need to worry about races between actually sleeping and the
"we think we're sleeping" state. It may lead to an extra wakeup but
that's better than missing it entirely.
Thanks a bunch for the comment! So the trick I was using here is to > > >> 
modify the timespec from the kick function before sending the IPI
signal. That way, we know that either we are inside the sleep (where the
signal wakes it up) or we are outside the sleep (where timespec={} will
make it return immediately).

The only race I can think of is if nanosleep does calculations based on
the timespec and we happen to send the signal right there and then.
Yes that's the race I was thinking of. Admittedly it's a small window
but it's theoretically possible and part of the reason why pselect was
created.

The problem with blocking IPIs is basically what Frank was describing
earlier: How do you unset the IPI signal pending status? If the signal
is never delivered, how can pselect differentiate "signal from last time
is still pending" from "new signal because I got an IPI"?
In this case we would take the additional wakeup which should be
harmless since we will take the WFx exit again and put us in the
correct state. But that's a lot better than busy looping.

I'm not sure I follow. I'm thinking of the following scenario:

    - trap into WFI handler
    - go to sleep with blocked SIG_IPI
    - SIG_IPI arrives, pselect() exits
    - signal is still pending because it's blocked
    - enter guest
    - trap into WFI handler
    - run pselect(), but it immediate exits because SIG_IPI is still pending

This was the loop I was seeing when running with SIG_IPI blocked. That's
part of the reason why I switched to a different model.
What I observe is that when returning from a pending signal pselect
consumes the signal (which is also consistent with my understanding of
what pselect does). That means that it doesn't matter if we take a
second WFx exit because once we reach the pselect in the second WFx
exit the signal will have been consumed by the pselect in the first
exit and we will just wait for the next one.

Aha! Thanks for the explanation. So, the first WFI in the series of
guest WFIs will likely wake up immediately? After a period without WFIs
there must be a pending SIG_IPI...

It shouldn't be a critical issue though because (as defined in D1.16.2)
"the architecture permits a PE to leave the low-power state for any
reason, it is permissible for a PE to treat WFI as a NOP, but this is
not recommended for lowest power operation."

BTW. I think a bit from the thread should go into the description of
patch 8, because it's not trivial and it would really be helpful to keep
in repo history. At least something like this (taken from an earlier
reply in the thread):

   In this implementation IPI is blocked on the CPU thread at startup and
   pselect() is used to atomically unblock the signal and begin sleeping.
   The signal is sent unconditionally so there's no need to worry about
   races between actually sleeping and the "we think we're sleeping"
   state. It may lead to an extra wakeup but that's better than missing
   it entirely.
Okay, I'll add something like that to the next version of the patch I send out.


If this is the only change, I've already added it for v4. If you want me to change it further, just let me know what to replace the patch description with.


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]