qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2] x86/cpu: initialize the CPU concurrently


From: Eduardo Habkost
Subject: Re: [RFC PATCH v2] x86/cpu: initialize the CPU concurrently
Date: Thu, 24 Dec 2020 13:06:32 -0500

On Thu, Dec 24, 2020 at 09:41:10PM +0800, Zhenyu Ye wrote:
> Hi Eduardo,
> 
> Sorry for the delay.
> 
> On 2020/12/22 5:36, Eduardo Habkost wrote:
> > On Mon, Dec 21, 2020 at 07:36:18PM +0800, Zhenyu Ye wrote:
> >> Providing a optional mechanism to wait for all VCPU threads be
> >> created out of qemu_init_vcpu(), then we can initialize the cpu
> >> concurrently on the x86 architecture.
> >>
> >> This reduces the time of creating virtual machines. For example, when
> >> the haxm is used as the accelerator, cpus_accel->create_vcpu_thread()
> >> will cause at least 200ms for each cpu, extremely prolong the boot
> >> time.
> >>

I have just realized one thing: all VCPU thread function
(including hax) keeps holding qemu_global_mutex most of the time.
Are you sure your patch is really making VCPU initialization run
in parallel?  Do you have numbers showing this patch really
improves boot time?


> >> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> >> Signed-off-by: eillon <yezhenyu2@huawei.com>
> > 
> > The patch is easier to follow now, but I have a question that may
> > be difficult to answer:
> > 
> > What exactly is the meaning of cpu->created=true, and what
> > exactly would break if we never wait for cpu->created==true at all?
> > 
> > I'm asking that because we might be introducing subtle races
> > here, if some of the remaining CPU initialization code in
> > x86_cpu_realizefn() [1] expects the VCPU thread to be already
> > initialized.
> > 
> > The cpu_reset() call below is one such example (but probably not
> > the only one).  cpu_reset() ends up calling
> > kvm_arch_reset_vcpu(), which seems to assume kvm_init_vcpu() was
> > already called.  With your patch, kvm_init_vcpu() might end up
> > being called after kvm_arch_reset_vcpu().
> > 
> 
> There's a chance that this happens.
> Could we move these (after qemu_init_vcpu()) out of x86_cpu_realizefn()
> to the x86_cpus_init(), after qemu_wait_all_vcpu_threads_init()?
> Such as:
> 
> void x86_cpus_init()
> {
>       foreach (cpu) {
>               x86_cpu_new();
>       }
> 
>       qemu_wait_all_vcpu_threads_init();
> 
>       foreach (cpu) {
>               x86_cpu_new_post();
>       }
> }

Maybe that would work, if the caveats are clearly documented.
I'm worried about bugs being introduced if people assume the VCPU
will always be fully initialized and ready to run after
qemu_init_vcpu() is called and qdev_realize() returns.

> 
> > Maybe a simpler alternative is to keep the existing thread
> > creation logic, but changing hax_cpu_thread_fn() to do less work
> > before calling cpu_thread_signal_created()?
> > 
> > In my testing (without this patch), creation of 8 KVM VCPU
> > threads in a 4 core machine takes less than 3 ms.  Why is
> > qemu_init_vcpu() taking so long on haxm?  Which parts of haxm
> > initialization can be moved after cpu_thread_signal_created(), to
> > make this better?
> > 
> 
> The most time-consuming operation in haxm is ioctl(HAX_VM_IOCTL_VCPU_CREATE).
> Saddly this can not be split.
> 
> Even if we fix the problem in haxm, other accelerators may also have
> this problem.  So I think if we can make the x86_cpu_new() concurrently,
> we should try to do it.

Changing the code to run all VCPU initialization actions for all
accelerators concurrently would require carefully reviewing the
VCPU thread code for all accelerators, looking for races.  Sounds
like a challenging task.  We could avoid that if we do something
that will parallelize only what we really need (and know to be
safe).

-- 
Eduardo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]