[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer |
Date: |
Mon, 14 Jan 2019 12:35:20 +0100 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
Am 14.01.2019 um 12:10 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:address@hidden
> > Am 09.01.2019 um 13:13 hat Pavel Dovgalyuk geschrieben:
> > > Replay is capable of recording normal BH events, but sometimes
> > > there are single use callbacks scheduled with aio_bh_schedule_oneshot
> > > function. This patch enables recording and replaying such callbacks.
> > > Block layer uses these events for calling the completion function.
> > > Replaying these calls makes the execution deterministic.
> > >
> > > Signed-off-by: Pavel Dovgalyuk <address@hidden>
> >
> > This still doesn't come even close to catching all BHs that need to be
> > caught. While you managed to show a few BHs that actually don't need to
> > be considered for recording when I asked for this in v7, most BHs in the
> > block layer can in some way lead to device callbacks and must therefore
> > be recorded.
>
> Let's have a brief review. I can change all the places, but how
> should I make a test case to be sure, that all of them are working ok?
The list is changing all the time. This is why I am so concerned about
special-casing a few callers instead of having a generic solution. I
don't know how we could make sure that we call the right function
everywhere.
> aio_bh_schedule_oneshot is used in:
> - blk_abort_aio_request
> - bdrv_co_yield_to_drain
> - iscsi_co_generic_cb
> - nfs_co_generic_cb
> - null_aio_common
> - nvme_process_completion
> - nvme_rw_cb
> - rbd_finish_aiocb
> - vxhs_iio_callback
> - (and couple of others not in the block layer)
In addition to these, we have at least a few functions that just resume
block layer coroutines rather than directly scheduling a BH with a
callback somewhere in the block layer.
> We must change this call to replay_bh_schedule_oneshot_event when
> the result of the BH execution affects the replayed guest state
> (e.g., interrupt request is generated or memory is written)
>
> If you think that all of these can do that, then I should change
> such function calls.
I haven't reviewed the code, but these names look like all of them can
eventually call back into the guest devices. They won't do that always,
but potentially.
> > How bad would it be to record some BHs even if recording them isn't
> > necessary? I'd definitely try to err on the safe side here. Having two
> > different sets of BH functions, you can't expect that people always use
> > the right one (especially if you don't even make the existing code base
> > consistently use the right one intially).
>
> There are two possible options:
> 1. Execution hangs when recording. Kind of deadlock caused by the incorrect
> management of the events. E.g., adding stopping the VM and trying to flush
> the block layer queue.
> 2. Execution hangs when replaying.
> One of the events that affect the guest state is missed or generated
> at the other moment (e.g., when BH is not linked to the execution step).
> Then the guest behaves differently and the order of the events in the log
> does not match the guest state (e.g., interrupt processing is not matched).
So basically when you have two events that are kind of nested? Operation
A triggers event A, but in order to complete the operation, you call
operation B with event B internally, which isn't available yet because
we're still handling event A?
Could this be solved by not having an order of events, but an order of
sets of events?
Kevin
- Re: [Qemu-devel] [PATCH v9 13/21] replay: implement replay-seek command to proceed to the desired step, (continued)
- [Qemu-devel] [PATCH v9 14/21] replay: refine replay-time module, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 15/21] replay: flush rr queue before loading the vmstate, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 16/21] gdbstub: add reverse step support in replay mode, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 17/21] gdbstub: add reverse continue support in replay mode, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 18/21] replay: describe reverse debugging in docs/replay.txt, Pavel Dovgalyuk, 2019/01/09
- [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer, Pavel Dovgalyuk, 2019/01/09
[Qemu-devel] [PATCH v9 20/21] replay: init rtc after enabling the replay, Pavel Dovgalyuk, 2019/01/09
[Qemu-devel] [PATCH v9 21/21] replay: document development rules, Pavel Dovgalyuk, 2019/01/09
Re: [Qemu-devel] [PATCH v9 00/21] Fixing record/replay and adding reverse debugging, no-reply, 2019/01/13