qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] thread-pool: use ThreadPool from the running thread


From: Dr. David Alan Gilbert
Subject: Re: [PATCH 2/2] thread-pool: use ThreadPool from the running thread
Date: Thu, 20 Oct 2022 17:22:17 +0100
User-agent: Mutt/2.2.7 (2022-08-07)

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Mon, Oct 03, 2022 at 10:52:33AM +0200, Emanuele Giuseppe Esposito wrote:
> > 
> > 
> > Am 30/09/2022 um 17:45 schrieb Kevin Wolf:
> > > Am 30.09.2022 um 14:17 hat Emanuele Giuseppe Esposito geschrieben:
> > >> Am 29/09/2022 um 17:30 schrieb Kevin Wolf:
> > >>> Am 09.06.2022 um 15:44 hat Emanuele Giuseppe Esposito geschrieben:
> > >>>> Remove usage of aio_context_acquire by always submitting work items
> > >>>> to the current thread's ThreadPool.
> > >>>>
> > >>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > >>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> > >>>
> > >>> The thread pool is used by things outside of the file-* block drivers,
> > >>> too. Even outside the block layer. Not all of these seem to submit work
> > >>> in the same thread.
> > >>>
> > >>>
> > >>> For example:
> > >>>
> > >>> postcopy_ram_listen_thread() -> qemu_loadvm_state_main() ->
> > >>> qemu_loadvm_section_start_full() -> vmstate_load() ->
> > >>> vmstate_load_state() -> spapr_nvdimm_flush_post_load(), which has:
> > >>>
> > >>> ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
>                          ^^^^^^^^^^^^^^^^^^^
> 
> aio_get_thread_pool() isn't thread safe either:
> 
>   ThreadPool *aio_get_thread_pool(AioContext *ctx)
>   {
>       if (!ctx->thread_pool) {
>           ctx->thread_pool = thread_pool_new(ctx);
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Two threads could race in aio_get_thread_pool().
> 
> I think post-copy is broken here: it's calling code that was only
> designed to be called from the main loop thread.
> 
> I have CCed Juan and David.

In theory the path that you describe there shouldn't happen - although
there is perhaps not enough protection on the load side to stop it
happening if presented with a bad stream.
This is documented in docs/devel/migration.rst under 'Destination
behaviour'; but to recap, during postcopy load we have a problem that we
need to be able to load incoming iterative (ie. RAM) pages during the
loading of normal devices, because the loading of a device may access
RAM that's not yet been transferred.

To do that, the device state of all the non-iterative devices (which I
think includes your spapr_nvdimm) is serialised into a separate
migration stream and sent as a 'package'.

We read the package off the stream on the main thread, but don't process
it until we fire off the 'listen' thread - which you spotted the
creation of above; the listen thread now takes over reading the
migration stream to process RAM pages, and since it's in the same
format, it calls qemu_loadvm_state_main() - but it doesn't expect
any devices in that other than the RAM devices; it's just expecting RAM.

In parallel with that, the main thread carries on loading the contents
of the 'package' - and that contains your spapr_nvdimm device (and any
other 'normal' devices); but that's OK because that's the main thread.

Now if something was very broken and sent a header for the spapr-nvdimm
down the main thread rather than into the package then, yes, we'd
trigger your case, but that shouldn't happen.

Dave

> > >>> ...
> > >>> thread_pool_submit_aio(pool, flush_worker_cb, state,
> > >>>                        spapr_nvdimm_flush_completion_cb, state);
> > >>>
> > >>> So it seems to me that we may be submitting work for the main thread
> > >>> from a postcopy migration thread.
> > >>>
> > >>> I believe the other direct callers of thread_pool_submit_aio() all
> > >>> submit work for the main thread and also run in the main thread.
> > >>>
> > >>>
> > >>> For thread_pool_submit_co(), pr_manager_execute() calls it with the pool
> > >>> it gets passed as a parameter. This is still bdrv_get_aio_context(bs) in
> > >>> hdev_co_ioctl() and should probably be changed the same way as for the
> > >>> AIO call in file-posix, i.e. use qemu_get_current_aio_context().
> > >>>
> > >>>
> > >>> We could consider either asserting in thread_pool_submit_aio() that we
> > >>> are really in the expected thread, or like I suggested for LinuxAio drop
> > >>> the pool parameter and always get it from the current thread (obviously
> > >>> this is only possible if migration could in fact schedule the work on
> > >>> its current thread - if it schedules it on the main thread and then
> > >>> exits the migration thread (which destroys the thread pool), that
> > >>> wouldn't be good).
> > >>
> > >> Dumb question: why not extend the already-existing poll->lock to cover
> > >> also the necessary fields like pool->head that are accessed by other
> > >> threads (only case I could find with thread_pool_submit_aio is the one
> > >> you pointed above)?
> > > 
> > > Other people are more familiar with this code, but I believe this could
> > > have performance implications. I seem to remember that this code is
> > > careful to avoid locking to synchronise between worker threads and the
> > > main thread.
> > > 
> > > But looking at the patch again, I have actually a dumb question, too:
> > > The locking you're removing is in thread_pool_completion_bh(). As this
> > > is a BH, it's running the the ThreadPool's context either way, no matter
> > > which thread called thread_pool_submit_aio().
> > > 
> > > I'm not sure what this aio_context_acquire/release pair is actually
> > > supposed to protect. Paolo's commit 1919631e6b5 introduced it. Was it
> > > just more careful than it needs to be?
> > > 
> > 
> > I think the goal is still to protect pool->head, but if so the
> > aiocontext lock is put in the wrong place, because as you said the bh is
> > always run in the thread pool context. Otherwise it seems to make no sense.
> > 
> > On the other side, thread_pool_submit_aio could be called by other
> > threads on behalf of the main loop, which means pool->head could be
> > modified (iothread calls thread_pool_submit_aio) while being read by the
> > main loop (another worker thread schedules thread_pool_completion_bh).
> > 
> > What are the performance implications? I mean, if the aiocontext lock in
> > the bh is actually useful and the bh really has to wait to take it,
> > being taken in much more places throughout the block layer won't be
> > better than extending the poll->lock I guess.
> 
> thread_pool_submit_aio() is missing documentation on how it is supposed
> to be called.
> 
> Taking pool->lock is conservative and fine in the short-term.
> 
> In the longer term we need to clarify how thread_pool_submit_aio() is
> supposed to be used and remove locking to protect pool->head if
> possible.
> 
> A bunch of the event loop APIs are thread-safe (aio_set_fd_handler(),
> qemu_schedule_bh(), etc) so it's somewhat natural to make
> thread_pool_submit_aio() thread-safe too. However, it would be nice to
> avoid synchronization and existing callers mostly call it from the same
> event loop thread that runs the BH and we can avoid locking in that
> case.
> 
> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]