[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bdrv_drained_begin deadlock with io-threads
From: |
Kevin Wolf |
Subject: |
Re: bdrv_drained_begin deadlock with io-threads |
Date: |
Thu, 2 Apr 2020 19:10:07 +0200 |
User-agent: |
Mutt/1.12.1 (2019-06-15) |
Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben:
> Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben:
> > > Can you reproduce the problem with my script, but pointing it to your
> > > Debian image and running stress-ng instead of dd?
> >
> > yes
> >
> > > If so, how long does
> > > it take to reproduce for you?
> >
> > I sometimes need up to 130 iterations ...
> >
> > Worse, I thought several times the bug is gone, but then it triggered again
> > (sigh).
> >
> > But most times below 30 iteration (if you run "stress-ng -d 5").
> >
> > Also note that it can happen at different places, but always inside a
> > drained section ...
>
> I got a stracktrace of a hanging coroutine:
>
> (gdb) qemu coroutine 0x7fd8c00132c0
> #0 0x00005630e16e9840 in qemu_coroutine_switch
> (from_=from_@entry=0x7fd8c00132c0, to_=to_@entry=0x7fd8cda865b8,
> action=action@entry=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
> #1 0x00005630e16e8521 in qemu_coroutine_yield () at util/qemu-coroutine.c:193
> #2 0x00005630e16e8ba5 in qemu_co_queue_wait_impl
> (queue=queue@entry=0x5630e48ab580, lock=lock@entry=0x0) at
> util/qemu-coroutine-lock.c:56
> #3 0x00005630e161f1ae in blk_wait_while_drained
> (blk=blk@entry=0x5630e48ab260) at
> /home/kwolf/source/qemu/include/qemu/lockable.h:46
> #4 0x00005630e1620600 in blk_wait_while_drained (blk=0x5630e48ab260) at
> block/block-backend.c:1189
> #5 0x00005630e1620600 in blk_co_pwritev_part (blk=0x5630e48ab260,
> offset=2922381312, bytes=856064, qiov=qiov@entry=0x7fd8c002cd70,
> qiov_offset=qiov_offset@entry=0, flags=0)
> at block/block-backend.c:1189
> #6 0x00005630e16207ce in blk_co_pwritev (flags=<optimized out>,
> qiov=0x7fd8c002cd70, bytes=<optimized out>, offset=<optimized out>,
> blk=<optimized out>) at block/block-backend.c:1429
> #7 0x00005630e16207ce in blk_aio_write_entry (opaque=0x7fd8c002cdc0) at
> block/block-backend.c:1429
> #8 0x00005630e16e98bb in coroutine_trampoline (i0=<optimized out>,
> i1=<optimized out>) at util/coroutine-ucontext.c:115
>
> So I think this is the bug: Calling blk_wait_while_drained() from
> anywhere between blk_inc_in_flight() and blk_dec_in_flight() is wrong
> because it will deadlock the drain operation.
>
> blk_aio_read/write_entry() take care of this and drop their reference
> around blk_wait_while_drained(). But if we hit the race condition that
> drain hasn't yet started there, but it has when we get to
> blk_co_preadv() or blk_co_pwritev_part(), then we're in a buggy code
> path.
With the following patch, it seems to survive for now. I'll give it some
more testing tomorrow (also qemu-iotests to check that I didn't
accidentally break something else.)
Kevin
diff --git a/block/block-backend.c b/block/block-backend.c
index 8b8f2a80a0..634acd1541 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1143,7 +1143,9 @@ static int blk_check_byte_request(BlockBackend *blk,
int64_t offset,
static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
{
if (blk->quiesce_counter && !blk->disable_request_queuing) {
+ blk_dec_in_flight(blk);
qemu_co_queue_wait(&blk->queued_requests, NULL);
+ blk_inc_in_flight(blk);
}
}
@@ -1154,8 +1156,6 @@ int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t
offset,
int ret;
BlockDriverState *bs;
- blk_wait_while_drained(blk);
-
/* Call blk_bs() only after waiting, the graph may have changed */
bs = blk_bs(blk);
trace_blk_co_preadv(blk, bs, offset, bytes, flags);
@@ -1186,8 +1186,6 @@ int coroutine_fn blk_co_pwritev_part(BlockBackend *blk,
int64_t offset,
int ret;
BlockDriverState *bs;
- blk_wait_while_drained(blk);
-
/* Call blk_bs() only after waiting, the graph may have changed */
bs = blk_bs(blk);
trace_blk_co_pwritev(blk, bs, offset, bytes, flags);
@@ -1262,6 +1260,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset,
uint8_t *buf,
.ret = NOT_DONE,
};
+ blk_inc_in_flight(blk);
if (qemu_in_coroutine()) {
/* Fast-path if already in coroutine context */
co_entry(&rwco);
@@ -1270,6 +1269,7 @@ static int blk_prw(BlockBackend *blk, int64_t offset,
uint8_t *buf,
bdrv_coroutine_enter(blk_bs(blk), co);
BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
}
+ blk_dec_in_flight(blk);
return rwco.ret;
}
@@ -1388,9 +1388,7 @@ static void blk_aio_read_entry(void *opaque)
QEMUIOVector *qiov = rwco->iobuf;
if (rwco->blk->quiesce_counter) {
- blk_dec_in_flight(rwco->blk);
blk_wait_while_drained(rwco->blk);
- blk_inc_in_flight(rwco->blk);
}
assert(qiov->size == acb->bytes);
@@ -1406,9 +1404,7 @@ static void blk_aio_write_entry(void *opaque)
QEMUIOVector *qiov = rwco->iobuf;
if (rwco->blk->quiesce_counter) {
- blk_dec_in_flight(rwco->blk);
blk_wait_while_drained(rwco->blk);
- blk_inc_in_flight(rwco->blk);
}
assert(!qiov || qiov->size == acb->bytes);
- Re: bdrv_drained_begin deadlock with io-threads, (continued)
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads,
Kevin Wolf <=
- Re: bdrv_drained_begin deadlock with io-threads, Thomas Lamprecht, 2020/04/03
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/03
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/03
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/03
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/06
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/02
- Re: bdrv_drained_begin deadlock with io-threads, Kevin Wolf, 2020/04/01
- Re: bdrv_drained_begin deadlock with io-threads, Dietmar Maurer, 2020/04/02