Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block

From:	Kevin Wolf
Subject:	Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer
Date:	Mon, 14 Jan 2019 12:35:20 +0100
User-agent:	Mutt/1.10.1 (2018-07-13)

Am 14.01.2019 um 12:10 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:address@hidden
> > Am 09.01.2019 um 13:13 hat Pavel Dovgalyuk geschrieben:
> > > Replay is capable of recording normal BH events, but sometimes
> > > there are single use callbacks scheduled with aio_bh_schedule_oneshot
> > > function. This patch enables recording and replaying such callbacks.
> > > Block layer uses these events for calling the completion function.
> > > Replaying these calls makes the execution deterministic.
> > >
> > > Signed-off-by: Pavel Dovgalyuk <address@hidden>
> > 
> > This still doesn't come even close to catching all BHs that need to be
> > caught. While you managed to show a few BHs that actually don't need to
> > be considered for recording when I asked for this in v7, most BHs in the
> > block layer can in some way lead to device callbacks and must therefore
> > be recorded.
> 
> Let's have a brief review. I can change all the places, but how
> should I make a test case to be sure, that all of them are working ok?

The list is changing all the time. This is why I am so concerned about
special-casing a few callers instead of having a generic solution. I
don't know how we could make sure that we call the right function
everywhere.

> aio_bh_schedule_oneshot is used in:
>  - blk_abort_aio_request
>  - bdrv_co_yield_to_drain
>  - iscsi_co_generic_cb
>  - nfs_co_generic_cb
>  - null_aio_common
>  - nvme_process_completion
>  - nvme_rw_cb
>  - rbd_finish_aiocb
>  - vxhs_iio_callback
>  - (and couple of others not in the block layer)

In addition to these, we have at least a few functions that just resume
block layer coroutines rather than directly scheduling a BH with a
callback somewhere in the block layer.

> We must change this call to replay_bh_schedule_oneshot_event when
> the result of the BH execution affects the replayed guest state
> (e.g., interrupt request is generated or memory is written)
> 
> If you think that all of these can do that, then I should change
> such function calls.

I haven't reviewed the code, but these names look like all of them can
eventually call back into the guest devices. They won't do that always,
but potentially.

> > How bad would it be to record some BHs even if recording them isn't
> > necessary? I'd definitely try to err on the safe side here. Having two
> > different sets of BH functions, you can't expect that people always use
> > the right one (especially if you don't even make the existing code base
> > consistently use the right one intially).
> 
> There are two possible options:
> 1. Execution hangs when recording. Kind of deadlock caused by the incorrect
>    management of the events. E.g., adding stopping the VM and trying to flush
>    the block layer queue.
> 2. Execution hangs when replaying. 
>    One of the events that affect the guest state is missed or generated
>    at the other moment (e.g., when BH is not linked to the execution step).
>    Then the guest behaves differently and the order of the events in the log
>    does not match the guest state (e.g., interrupt processing is not matched).

So basically when you have two events that are kind of nested? Operation
A triggers event A, but in order to complete the operation, you call
operation B with event B internally, which isn't available yet because
we're still handling event A?

Could this be solved by not having an order of events, but an order of
sets of events?

Kevin

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v9 13/21] replay: implement replay-seek command to proceed to the desired step, (continued)
- [Qemu-devel] [PATCH v9 14/21] replay: refine replay-time module, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 15/21] replay: flush rr queue before loading the vmstate, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 16/21] gdbstub: add reverse step support in replay mode, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 17/21] gdbstub: add reverse continue support in replay mode, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 18/21] replay: describe reverse debugging in docs/replay.txt, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer, Pavel Dovgalyuk, 2019/01/09
  - Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer, Kevin Wolf, 2019/01/11
    - Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer, Pavel Dovgalyuk, 2019/01/14
    - Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer, Kevin Wolf <=
    - Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer, Pavel Dovgalyuk, 2019/01/14
- [Qemu-devel] [PATCH v9 20/21] replay: init rtc after enabling the replay, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 21/21] replay: document development rules, Pavel Dovgalyuk, 2019/01/09
- Re: [Qemu-devel] [PATCH v9 00/21] Fixing record/replay and adding reverse debugging, no-reply, 2019/01/13

Prev by Date: Re: [Qemu-devel] Questions about VFIO enabling MSIX vector
Next by Date: [Qemu-devel] [PATCH] io: ensure UNIX client doesn't unlink server socket
Previous by thread: Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer
Next by thread: Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer
Index(es):
- Date
- Thread