[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 09/12] jobs: ensure sleep in job_sleep_ns is fully performed
From: |
Stefan Hajnoczi |
Subject: |
Re: [PATCH 09/12] jobs: ensure sleep in job_sleep_ns is fully performed |
Date: |
Wed, 26 Jan 2022 11:21:31 +0000 |
On Tue, Jan 18, 2022 at 11:27:35AM -0500, Emanuele Giuseppe Esposito wrote:
> If a drain happens while a job is sleeping, the timeout
> gets cancelled and the job continues once the drain ends.
> This is especially bad for the sleep performed in commit and stream
> jobs, since that is dictated by ratelimit to maintain a certain speed.
>
> Basically the execution path is the followig:
s/followig/following/
> 1. job calls job_sleep_ns, and yield with a timer in @ns ns.
> 2. meanwhile, a drain is executed, and
> child_job_drained_{begin/end} could be executed as ->drained_begin()
> and ->drained_end() callbacks.
> Therefore child_job_drained_begin() enters the job, that continues
> execution in job_sleep_ns() and calls job_pause_point_locked().
> 3. job_pause_point_locked() detects that we are in the middle of a
> drain, and firstly deletes any existing timer and then yields again,
> waiting for ->drained_end().
> 4. Once draining is finished, child_job_drained_end() runs and resumes
> the job. At this point, the timer has been lost and we just resume
> without checking if enough time has passed.
>
> This fix implies that from now onwards, job_sleep_ns will force the job
> to sleep @ns, even if it is wake up (purposefully or not) in the middle
> of the sleep. Therefore qemu-iotests test might run a little bit slower,
> depending on the speed of the job. Setting a job speed to values like "1"
> is not allowed anymore (unless you want to wait forever).
>
> Because of this fix, test_stream_parallel() in tests/qemu-iotests/030
> takes too long, since speed of stream job is just 1024 and before
> it was skipping all the wait thanks to the drains. Increase the
> speed to 256 * 1024. Exactly the same happens for test 151.
>
> Instead we need to sleep less in test_cancel_ready() test-blockjob.c,
> so that the job will be able to exit the sleep and transition to ready
> before the main loop asserts.
I remember seeing Hanna and Kevin use carefully rate-limited jobs in
qemu-iotests. They might have thoughts on whether this patch is
acceptable or not.
signature.asc
Description: PGP signature
- [PATCH 00/12] Removal of Aiocontext lock through drains: protect bdrv_replace_child_noperm., Emanuele Giuseppe Esposito, 2022/01/18
- [PATCH 05/12] test-bdrv-drain.c: adapt test to the coming subtree drains, Emanuele Giuseppe Esposito, 2022/01/18
- [PATCH 01/12] introduce BDRV_POLL_WHILE_UNLOCKED, Emanuele Giuseppe Esposito, 2022/01/18
- [PATCH 09/12] jobs: ensure sleep in job_sleep_ns is fully performed, Emanuele Giuseppe Esposito, 2022/01/18
- Re: [PATCH 09/12] jobs: ensure sleep in job_sleep_ns is fully performed,
Stefan Hajnoczi <=
- [PATCH 10/12] block.c: add subtree_drains where needed, Emanuele Giuseppe Esposito, 2022/01/18
- [PATCH 03/12] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach(), Emanuele Giuseppe Esposito, 2022/01/18
- [PATCH 06/12] test-bdrv-drain.c: remove test_detach_by_parent_cb(), Emanuele Giuseppe Esposito, 2022/01/18
- [PATCH 12/12] block.c: additional assert qemu in main tread, Emanuele Giuseppe Esposito, 2022/01/18
- [PATCH 08/12] reopen: add a transaction to drain_end nodes picked in bdrv_reopen_parse_file_or_backing, Emanuele Giuseppe Esposito, 2022/01/18
- [PATCH 04/12] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child, Emanuele Giuseppe Esposito, 2022/01/18
- [PATCH 02/12] block/io.c: make bdrv_do_drained_begin_quiesce static and introduce bdrv_drained_begin_no_poll, Emanuele Giuseppe Esposito, 2022/01/18