qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/8] hvf: Move common code out


From: Peter Collingbourne
Subject: Re: [PATCH 2/8] hvf: Move common code out
Date: Thu, 3 Dec 2020 10:42:20 -0800

On Thu, Dec 3, 2020 at 1:41 AM Roman Bolshakov <r.bolshakov@yadro.com> wrote:
>
> On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote:
> > On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf <agraf@csgraf.de> wrote:
> > >
> > >
> > > On 01.12.20 00:01, Peter Collingbourne wrote:
> > > > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf <agraf@csgraf.de> wrote:
> > > >> Hi Peter,
> > > >>
> > > >> On 30.11.20 22:08, Peter Collingbourne wrote:
> > > >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang <lfy@google.com> wrote:
> > > >>>>
> > > >>>> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf <agraf@csgraf.de> 
> > > >>>> wrote:
> > > >>>>> Hi Frank,
> > > >>>>>
> > > >>>>> Thanks for the update :). Your previous email nudged me into the 
> > > >>>>> right direction. I previously had implemented WFI through the 
> > > >>>>> internal timer framework which performed way worse.
> > > >>>> Cool, glad it's helping. Also, Peter found out that the main thing 
> > > >>>> keeping us from just using cntpct_el0 on the host directly and 
> > > >>>> compare with cval is that if we sleep, cval is going to be much < 
> > > >>>> cntpct_el0 by the sleep time. If we can get either the architecture 
> > > >>>> or macos to read out the sleep time then we might be able to not 
> > > >>>> have to use a poll interval either!
> > > >>>>> Along the way, I stumbled over a few issues though. For starters, 
> > > >>>>> the signal mask for SIG_IPI was not set correctly, so while 
> > > >>>>> pselect() would exit, the signal would never get delivered to the 
> > > >>>>> thread! For a fix, check out
> > > >>>>>
> > > >>>>>     
> > > >>>>> https://patchew.org/QEMU/20201130030723.78326-1-agraf@csgraf.de/20201130030723.78326-4-agraf@csgraf.de/
> > > >>>>>
> > > >>>> Thanks, we'll take a look :)
> > > >>>>
> > > >>>>> Please also have a look at my latest stab at WFI emulation. It 
> > > >>>>> doesn't handle WFE (that's only relevant in overcommitted 
> > > >>>>> scenarios). But it does handle WFI and even does something similar 
> > > >>>>> to hlt polling, albeit not with an adaptive threshold.
> > > >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> > > >>> I'll reply to your patch here. You have:
> > > >>>
> > > >>> +                    /* Set cpu->hvf->sleeping so that we get a
> > > >>> SIG_IPI signal. */
> > > >>> +                    cpu->hvf->sleeping = true;
> > > >>> +                    smp_mb();
> > > >>> +
> > > >>> +                    /* Bail out if we received an IRQ meanwhile */
> > > >>> +                    if (cpu->thread_kicked || 
> > > >>> (cpu->interrupt_request &
> > > >>> +                        (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> > > >>> +                        cpu->hvf->sleeping = false;
> > > >>> +                        break;
> > > >>> +                    }
> > > >>> +
> > > >>> +                    /* nanosleep returns on signal, so we wake up on 
> > > >>> kick. */
> > > >>> +                    nanosleep(ts, NULL);
> > > >>>
> > > >>> and then send the signal conditional on whether sleeping is true, but
> > > >>> I think this is racy. If the signal is sent after sleeping is set to
> > > >>> true but before entering nanosleep then I think it will be ignored and
> > > >>> we will miss the wakeup. That's why in my implementation I block IPI
> > > >>> on the CPU thread at startup and then use pselect to atomically
> > > >>> unblock and begin sleeping. The signal is sent unconditionally so
> > > >>> there's no need to worry about races between actually sleeping and the
> > > >>> "we think we're sleeping" state. It may lead to an extra wakeup but
> > > >>> that's better than missing it entirely.
> > > >>
> > > >> Thanks a bunch for the comment! So the trick I was using here is to > 
> > > >> > >> modify the timespec from the kick function before sending the IPI
> > > >> signal. That way, we know that either we are inside the sleep (where 
> > > >> the
> > > >> signal wakes it up) or we are outside the sleep (where timespec={} will
> > > >> make it return immediately).
> > > >>
> > > >> The only race I can think of is if nanosleep does calculations based on
> > > >> the timespec and we happen to send the signal right there and then.
> > > > Yes that's the race I was thinking of. Admittedly it's a small window
> > > > but it's theoretically possible and part of the reason why pselect was
> > > > created.
> > > >
> > > >> The problem with blocking IPIs is basically what Frank was describing
> > > >> earlier: How do you unset the IPI signal pending status? If the signal
> > > >> is never delivered, how can pselect differentiate "signal from last 
> > > >> time
> > > >> is still pending" from "new signal because I got an IPI"?
> > > > In this case we would take the additional wakeup which should be
> > > > harmless since we will take the WFx exit again and put us in the
> > > > correct state. But that's a lot better than busy looping.
> > >
> > >
> > > I'm not sure I follow. I'm thinking of the following scenario:
> > >
> > >    - trap into WFI handler
> > >    - go to sleep with blocked SIG_IPI
> > >    - SIG_IPI arrives, pselect() exits
> > >    - signal is still pending because it's blocked
> > >    - enter guest
> > >    - trap into WFI handler
> > >    - run pselect(), but it immediate exits because SIG_IPI is still 
> > > pending
> > >
> > > This was the loop I was seeing when running with SIG_IPI blocked. That's
> > > part of the reason why I switched to a different model.
> >
> > What I observe is that when returning from a pending signal pselect
> > consumes the signal (which is also consistent with my understanding of
> > what pselect does). That means that it doesn't matter if we take a
> > second WFx exit because once we reach the pselect in the second WFx
> > exit the signal will have been consumed by the pselect in the first
> > exit and we will just wait for the next one.
> >
>
> Aha! Thanks for the explanation. So, the first WFI in the series of
> guest WFIs will likely wake up immediately? After a period without WFIs
> there must be a pending SIG_IPI...
>
> It shouldn't be a critical issue though because (as defined in D1.16.2)
> "the architecture permits a PE to leave the low-power state for any
> reason, it is permissible for a PE to treat WFI as a NOP, but this is
> not recommended for lowest power operation."
>
> BTW. I think a bit from the thread should go into the description of
> patch 8, because it's not trivial and it would really be helpful to keep
> in repo history. At least something like this (taken from an earlier
> reply in the thread):
>
>   In this implementation IPI is blocked on the CPU thread at startup and
>   pselect() is used to atomically unblock the signal and begin sleeping.
>   The signal is sent unconditionally so there's no need to worry about
>   races between actually sleeping and the "we think we're sleeping"
>   state. It may lead to an extra wakeup but that's better than missing
>   it entirely.

Okay, I'll add something like that to the next version of the patch I send out.

Peter

>
>
> Thanks,
> Roman
>
> > I don't know why things may have been going wrong in your
> > implementation but it may be related to the issue with
> > mach_absolute_time() which I posted about separately and was also
> > causing busy loops for us in some cases. Once that issue was fixed in
> > our implementation we started seeing sleep until VTIMER due work
> > properly.
> >
> > >
> > >
> > > > I reckon that you could improve things a little by unblocking the
> > > > signal and then reblocking it before unlocking iothread (e.g. with a
> > > > pselect with zero time interval), which would flush any pending
> > > > signals. Since any such signal would correspond to a signal from last
> > > > time (because we still have the iothread lock) we know that any future
> > > > signals should correspond to new IPIs.
> > >
> > >
> > > Yeah, I think you actually *have* to do exactly that, because otherwise
> > > pselect() will always return after 0ns because the signal is still 
> > > pending.
> > >
> > > And yes, I agree that that starts to sound a bit less racy now. But it
> > > means we can probably also just do
> > >
> > >    - WFI handler
> > >    - block SIG_IPI
> > >    - set hvf->sleeping = true
> > >    - check for pending interrupts
> > >    - pselect()
> > >    - unblock SIG_IPI
> > >
> > > which means we run with SIG_IPI unmasked by default. I don't think the
> > > number of signal mask changes is any different with that compared to
> > > running with SIG_IPI always masked, right?
> >
>
> P.S. Just found that Alex already raised my concern. Pending signals
> have to be consumed or there should be no pending signals to start
> sleeping on the very first WFI.
>
> > And unlock/lock iothread around the pselect? I suppose that could work
> > but as I mentioned it would just be an optimization.
> >
> > Maybe I can try to make my approach work on top of your series, or if
> > you already have a patch I can try to debug it. Let me know.
> >
> > Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]