[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH for-4.1 0/2] block: bdrv_drained_end() changes f
From: |
Max Reitz |
Subject: |
Re: [Qemu-devel] [PATCH for-4.1 0/2] block: bdrv_drained_end() changes fallout |
Date: |
Mon, 22 Jul 2019 19:18:20 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 |
On 22.07.19 15:30, Max Reitz wrote:
> Hi,
>
> I noted that test-bdrv-drain sometimgs hangs (very rarely, though), and
> tried to write a test that triggers the issue. I failed to do so (there
> is a good reason for that, see patch 1), but on my way I noticed that
> calling bdrv_set_aio_context_ignore() from any AioContext but the main
> one is a bad idea. Hence patch 2.
>
> Anyway, I found the problem, which is fixed by patch 1 -- I think it’s
> rather obvious. There is no dedicated test because I don’t think it’s
> possible to write one, as I explain there.
>
>
> Max Reitz (2):
> block: Dec. drained_end_counter before bdrv_wakeup
> block: Only the main loop can change AioContexts
>
> include/block/block.h | 8 +++-----
> block.c | 13 ++++++++-----
> block/io.c | 5 ++---
> 3 files changed, 13 insertions(+), 13 deletions(-)
Sorry, applied to my block branch.
“Sorry” obviously because I didn’t really give much time to review this
series. My justification is:
- rc2’s tomorrow. I know the current code is broken, so I’d rather take
the chance to have a fixed rc2 than to know it to be broken and force
an rc4 by sending this for rc3.
- Patch 1 looks really obvious and simple to me. It makes sense to
decrement the drained_end_counter exactly where .done is set to true.
- Patch 2 is not so obvious. But the only dangerous change it
introduces is an assertion that bdrv_set_aio_context_ignore() is
called from the main loop. Right now, if it is called from anywhere
but the main loop, we *will* run into assertions elsewhere:
- bdrv_drained_begin() does a BDRV_POLL_WHILE(bs, ...). This only
works from the main context or bs's context.
- bdrv_drained_end() does the same, but now bs's context has changed.
Ergo, right now (on master), you can run bdrv_set_aio_context_ignore()
only from the main loop anyway. The new assertion only makes that
fact more explicit.
Now this makes it look like before e037c09c78520, you could run
bdrv_set_aio_context_ignore() from the old context (like the comment
therein said). But that’s wrong. Even before e037c09c78520,
bdrv_drained_end() didn’t work for mixed-context trees unless you call
it from the main context: It schedules the drained_end callbacks in the
respective node's context, but then it polls them from the original
context. So if you modify e.g. test_set_aio_context() in
test-bdrv-drain to run bdrv_try_set_aio_context() from a BH in the old
context, you will see a failed assertion in bdrv_drain_invoke()'s
BDRV_POLL_WHILE. (I’ve attached a diff for use on e037c09c78520^.)
Therefore, I’m confident this series doesn’t make things worse, which is
why I’m taking it without reviews.
Max
test-bdrv-drain.diff
Description: Text Data
signature.asc
Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH for-4.1 0/2] block: bdrv_drained_end() changes fallout,
Max Reitz <=