qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 17/20] job.c: enable job lock/unlock and remove Aiocontext


From: Emanuele Giuseppe Esposito
Subject: Re: [PATCH v8 17/20] job.c: enable job lock/unlock and remove Aiocontext locks
Date: Wed, 6 Jul 2022 23:29:33 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


Am 05/07/2022 um 15:07 schrieb Stefan Hajnoczi:
> On Wed, Jun 29, 2022 at 10:15:35AM -0400, Emanuele Giuseppe Esposito wrote:
>> Change the job_{lock/unlock} and macros to use job_mutex.
>>
>> Now that they are not nop anymore, remove the aiocontext
>> to avoid deadlocks.
>>
>> Therefore:
>> - when possible, remove completely the aiocontext lock/unlock pair
>> - if it is used by some other function too, reduce the locking
>> section as much as possible, leaving the job API outside.
>>
>> There is only one JobDriver callback, ->free() that assumes that
>> the aiocontext lock is held (because it calls bdrv_unref), so for
>> now keep that under aiocontext lock.
> 
> This discussion shouldn't hold up the patch series, it's a separate
> issue:
> 
> Why does bdrv_unref() need the AioContext lock? The reference count
> itself is protected by the BQL (bdrv_ref() is GS too). I/O requests
> should be using fine-grained locks now, so I'm not sure if we still need
> to hold the AioContext lock to drain them?

If I remove the AioContex lock/unlock in job_unref_locked, I see that
test 200 and test-bdrv-drain are failing.

The reason is that job->free() is calling block_job_free and then we
eventually get to bdrv_detach_child claling bdrv_try_set_aio_context
from the main loop, on a node that is in another AioContext lock.

So it isn't really about bdrv_unref, but more bdrv_try_set_aio_context.
Until we don't find a solution to that, we can't get rid of this
aiocontext lock.

That's the shared call stack from both tests:

#0  __pthread_kill_implementation (threadid=<optimized out>,
    signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff66a64a3 in __pthread_kill_internal (signo=6,
    threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ffff6659d06 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#3  0x00007ffff662c7d3 in __GI_abort () at abort.c:79
#4  0x0000555555d6411f in error_exit (err=<optimized out>,
    msg=msg@entry=0x555555f49b30 <__func__.20> "qemu_mutex_unlock_impl")
    at ../util/qemu-thread-posix.c:38
--Type <RET> for more, q to quit, c to continue without paging--
#5  0x0000555555d64965 in qemu_mutex_unlock_impl (mutex=0x5555568d0620,
    file=<optimized out>, line=<optimized out>)
    at ../util/qemu-thread-posix.c:118
#6  0x0000555555c39c06 in bdrv_set_aio_context_ignore (bs=0x5555575133e0,
    new_context=0x5555566a2ad0, ignore=0x7fffffffd110) at ../block.c:7399
#7  0x0000555555c3a100 in bdrv_child_try_set_aio_context (
    bs=bs@entry=0x5555575133e0, ctx=ctx@entry=0x5555566a2ad0,
    ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
    at ../block.c:7493
#8  0x0000555555c3a1f6 in bdrv_try_set_aio_context (errp=0x0,
--Type <RET> for more, q to quit, c to continue without paging--
    ctx=0x5555566a2ad0, bs=0x5555575133e0) at ../block.c:7503
#9  bdrv_detach_child (childp=0x7fffffffd168) at ../block.c:3130
#10 bdrv_root_unref_child (child=<optimized out>,
child@entry=0x55555732dc00)
    at ../block.c:3228
#11 0x0000555555c4265f in block_job_remove_all_bdrv (job=0x5555575b6a30)
    at ../blockjob.c:195
#12 0x0000555555c426b5 in block_job_free (job=0x5555575b6a30)
    at ../blockjob.c:76
#13 0x0000555555c4472e in job_unref_locked (job=0x5555575b6a30)
    at ../job.c:464
--Type <RET> for more, q to quit, c to continue without paging--
#14 job_unref_locked (job=0x5555575b6a30) at ../job.c:450

> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]