qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-4.1 0/2] block: bdrv_drained_end() changes f


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH for-4.1 0/2] block: bdrv_drained_end() changes fallout
Date: Mon, 22 Jul 2019 19:18:20 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2

On 22.07.19 15:30, Max Reitz wrote:
> Hi,
> 
> I noted that test-bdrv-drain sometimgs hangs (very rarely, though), and
> tried to write a test that triggers the issue.  I failed to do so (there
> is a good reason for that, see patch 1), but on my way I noticed that
> calling bdrv_set_aio_context_ignore() from any AioContext but the main
> one is a bad idea.  Hence patch 2.
> 
> Anyway, I found the problem, which is fixed by patch 1 -- I think it’s
> rather obvious.  There is no dedicated test because I don’t think it’s
> possible to write one, as I explain there.
> 
> 
> Max Reitz (2):
>   block: Dec. drained_end_counter before bdrv_wakeup
>   block: Only the main loop can change AioContexts
> 
>  include/block/block.h |  8 +++-----
>  block.c               | 13 ++++++++-----
>  block/io.c            |  5 ++---
>  3 files changed, 13 insertions(+), 13 deletions(-)

Sorry, applied to my block branch.


“Sorry” obviously because I didn’t really give much time to review this
series.  My justification is:

- rc2’s tomorrow.  I know the current code is broken, so I’d rather take
  the chance to have a fixed rc2 than to know it to be broken and force
  an rc4 by sending this for rc3.

- Patch 1 looks really obvious and simple to me.  It makes sense to
  decrement the drained_end_counter exactly where .done is set to true.

- Patch 2 is not so obvious.  But the only dangerous change it
  introduces is an assertion that bdrv_set_aio_context_ignore() is
  called from the main loop.  Right now, if it is called from anywhere
  but the main loop, we *will* run into assertions elsewhere:
  - bdrv_drained_begin() does a BDRV_POLL_WHILE(bs, ...).  This only
    works from the main context or bs's context.
  - bdrv_drained_end() does the same, but now bs's context has changed.
  Ergo, right now (on master), you can run bdrv_set_aio_context_ignore()
  only from the main loop anyway.  The new assertion only makes that
  fact more explicit.

Now this makes it look like before e037c09c78520, you could run
bdrv_set_aio_context_ignore() from the old context (like the comment
therein said).  But that’s wrong.  Even before e037c09c78520,
bdrv_drained_end() didn’t work for mixed-context trees unless you call
it from the main context: It schedules the drained_end callbacks in the
respective node's context, but then it polls them from the original
context.  So if you modify e.g. test_set_aio_context() in
test-bdrv-drain to run bdrv_try_set_aio_context() from a BH in the old
context, you will see a failed assertion in bdrv_drain_invoke()'s
BDRV_POLL_WHILE.  (I’ve attached a diff for use on e037c09c78520^.)

Therefore, I’m confident this series doesn’t make things worse, which is
why I’m taking it without reviews.

Max

Attachment: test-bdrv-drain.diff
Description: Text Data

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]