Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work

From:	Alex Bennée
Subject:	Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work
Date:	Mon, 24 Jun 2019 13:43:06 +0100
User-agent:	mu4e 1.3.2; emacs 26.1

Roman Kagan <address@hidden> writes:

> On Mon, Jun 24, 2019 at 11:58:23AM +0100, Alex Bennée wrote:
>> Roman Kagan <address@hidden> writes:
>>
>> > It was introduced in commit b129972c8b41e15b0521895a46fd9c752b68a5e,
>> > with the following motivation:
>>
>> I can't find this commit in my tree.
>
> OOPS, that was supposed to be ab129972c8b41e15b0521895a46fd9c752b68a5e,
> sorry.
>
>>
>> >
>> >   Because start_exclusive uses CPU_FOREACH, merge exclusive_lock with
>> >   qemu_cpu_list_lock: together with a call to exclusive_idle (via
>> >   cpu_exec_start/end) in cpu_list_add, this protects exclusive work
>> >   against concurrent CPU addition and removal.
>> >
>> > However, it seems to be redundant, because the cpu-exclusive
>> > infrastructure provides suffificent protection against the newly added
>> > CPU starting execution while the cpu-exclusive work is running, and the
>> > aforementioned traversing of the cpu list is protected by
>> > qemu_cpu_list_lock.
>> >
>> > Besides, this appears to be the only place where the cpu-exclusive
>> > section is entered with the BQL taken, which has been found to trigger
>> > AB-BA deadlock as follows:
>> >
>> >     vCPU thread                             main thread
>> >     -----------                             -----------
>> > async_safe_run_on_cpu(self,
>> >                       async_synic_update)
>> > ...                                         [cpu hot-add]
>> > process_queued_cpu_work()
>> >   qemu_mutex_unlock_iothread()
>> >                                             [grab BQL]
>> >   start_exclusive()                         cpu_list_add()
>> >   async_synic_update()                        finish_safe_work()
>> >     qemu_mutex_lock_iothread()                  cpu_exec_start()
>> >
>> > So remove it.  This paves the way to establishing a strict nesting rule
>> > of never entering the exclusive section with the BQL taken.
>> >
>> > Signed-off-by: Roman Kagan <address@hidden>
>> > ---
>> >  cpus-common.c | 8 --------
>> >  1 file changed, 8 deletions(-)
>> >
>> > diff --git a/cpus-common.c b/cpus-common.c
>> > index 3ca58c64e8..023cfebfa3 100644
>> > --- a/cpus-common.c
>> > +++ b/cpus-common.c
>> > @@ -69,12 +69,6 @@ static int cpu_get_free_index(void)
>> >      return cpu_index;
>> >  }
>> >
>> > -static void finish_safe_work(CPUState *cpu)
>> > -{
>> > -    cpu_exec_start(cpu);
>> > -    cpu_exec_end(cpu);
>> > -}
>> > -
>>
>> This makes sense to me intellectually but I'm worried I've missed the
>> reason for it being introduced. Without finish_safe_work we have to wait
>> for the actual vCPU thread function to acquire and release the BQL and
>> enter it's first cpu_exec_start().
>>
>> I guess I'd be happier if we had a hotplug test where we could stress
>> test the operation and be sure we've not just moved the deadlock
>> somewhere else.
>
> Me too.  Unfortunately I haven't managed to come up with an idea how to
> do this test.  One of the race participants, the safe work in a vCPU
> thread, happens in response to an MSR write by the guest.  ATM there's
> no way to do it without an actual guest running.  I'll have a look if I
> can make a vm test for it, using a linux guest and its /dev/cpu/*/msr.

Depending on how much machinery is required to trigger this we could
add a system mode test. However there isn't much point if it requires
duplicating the entire guest hotplug stack. It maybe easier to trigger
on ARM - the PCSI sequence isn't overly complicated to deal with but I
don't know what the impact of MSIs is.


--
Alex Bennée

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work, Alex Bennée, 2019/06/24
- Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work, Roman Kagan, 2019/06/24
  - Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work, Alex Bennée <=

Prev by Date: [Qemu-devel] [PATCH] target/i386: fix feature check in hyperv-stub.c
Next by Date: Re: [Qemu-devel] [PULL 03/25] i386/kvm: convert hyperv enlightenments properties from bools to bits
Previous by thread: Re: [Qemu-devel] [RFC PATCH 1/2] cpus-common: nuke finish_safe_work
Next by thread: Re: [Qemu-devel] [PATCH v17 03/10] acpi: add build_append_ghes_notify() helper for Hardware Error Notification
Index(es):
- Date
- Thread