[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues |
Date: |
Wed, 9 Oct 2019 11:15:04 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 |
On 09/10/19 10:02, Jan Glauber wrote:
> On Mon, Oct 07, 2019 at 04:58:30PM +0200, Paolo Bonzini wrote:
>> On 07/10/19 16:44, dann frazier wrote:
>>> On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
>>>> On 02/10/19 11:23, Jan Glauber wrote:
>>>>> I've looked into this on ThunderX2. The arm64 code generated for the
>>>>> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
>>>>> memory barriers. It is just plain ldaxr/stlxr.
>>>>>
>>>>> From my understanding this is not sufficient for SMP sync.
>>>>>
>>>>> If I read this comment correct:
>>>>>
>>>>> void aio_notify(AioContext *ctx)
>>>>> {
>>>>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
>>>>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>>>>> */
>>>>> smp_mb();
>>>>> if (ctx->notify_me) {
>>>>>
>>>>> it points out that the smp_mb() should be paired. But as
>>>>> I said the used atomics don't generate any barriers at all.
>>>>
>>>> Based on the rest of the thread, this patch should also fix the bug:
>>>>
>>>> diff --git a/util/async.c b/util/async.c
>>>> index 47dcbfa..721ea53 100644
>>>> --- a/util/async.c
>>>> +++ b/util/async.c
>>>> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
>>>> aio_notify_accept(ctx);
>>>>
>>>> for (bh = ctx->first_bh; bh; bh = bh->next) {
>>>> - if (bh->scheduled) {
>>>> + if (atomic_mb_read(&bh->scheduled)) {
>>>> return true;
>>>> }
>>>> }
>>>>
>>>>
>>>> And also the memory barrier in aio_notify can actually be replaced
>>>> with a SEQ_CST load:
>>>>
>>>> diff --git a/util/async.c b/util/async.c
>>>> index 47dcbfa..721ea53 100644
>>>> --- a/util/async.c
>>>> +++ b/util/async.c
>>>> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>>>>
>>>> void aio_notify(AioContext *ctx)
>>>> {
>>>> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
>>>> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>>>> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written
>>>> before
>>>> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or
>>>> + * atomic_add in aio_poll.
>>>> */
>>>> - smp_mb();
>>>> - if (ctx->notify_me) {
>>>> + if (atomic_mb_read(&ctx->notify_me)) {
>>>> event_notifier_set(&ctx->notifier);
>>>> atomic_mb_set(&ctx->notified, true);
>>>> }
>>>>
>>>>
>>>> Would you be able to test these (one by one possibly)?
>>>
>>> Paolo,
>>> I tried them both separately and together on a Hi1620 system, each
>>> time it hung in the first iteration. Here's a backtrace of a run with
>>> both patches applied:
>>
>> Ok, not a great start... I'll find myself an aarch64 machine and look
>> at it more closely. I'd like the patch to be something we can
>> understand and document, since this is probably the second most-used
>> memory barrier idiom in QEMU.
>>
>> Paolo
>
> I'm still not sure what the actual issue is here, but could it be some bad
> interaction between the notify_me and the list_lock? The are both 4 byte
> and side-by-side:
>
> address notify_me: 0xaaaadb528aa0 sizeof notify_me: 4
> address list_lock: 0xaaaadb528aa4 sizeof list_lock: 4
>
> AFAICS the generated code looks OK (all load/store exclusive done
> with 32 bit size):
>
> e6c: 885ffc01 ldaxr w1, [x0]
> e70: 11000821 add w1, w1, #0x2
> e74: 8802fc01 stlxr w2, w1, [x0]
>
> ...but if I bump notify_me size to uint64_t the issue goes away.
Ouch. :) Is this with or without my patch(es)?
Also, what if you just add a dummy uint32_t after notify_me?
Thanks,
Paolo
>
> BTW, the image file I convert in the testcase is ~20 GB.
>
> --Jan
>
> diff --git a/include/block/aio.h b/include/block/aio.h
> index a1d6b9e24939..e8a5ea3860bb 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -83,7 +83,7 @@ struct AioContext {
> * Instead, the aio_poll calls include both the prepare and the
> * dispatch phase, hence a simple counter is enough for them.
> */
> - uint32_t notify_me;
> + uint64_t notify_me;
>
> /* A lock to protect between QEMUBH and AioHandler adders and deleter,
> * and to ensure that no callbacks are removed while we're walking and
>
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, (continued)
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, Paolo Bonzini, 2019/10/07
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, Jan Glauber, 2019/10/07
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, dann frazier, 2019/10/07
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, Paolo Bonzini, 2019/10/07
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, Jan Glauber, 2019/10/09
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues,
Paolo Bonzini <=
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, Jan Glauber, 2019/10/11
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, Paolo Bonzini, 2019/10/11
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, Jan Glauber, 2019/10/11
- Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, dann frazier, 2019/10/11
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues, dann frazier, 2019/10/11