qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] hw/nvme: Use irqfd to send interrupts


From: Stefan Hajnoczi
Subject: Re: [RFC] hw/nvme: Use irqfd to send interrupts
Date: Thu, 21 Jul 2022 09:29:22 -0400



On Wed, Jul 20, 2022, 22:36 Jinhao Fan <fanjinhao21s@ict.ac.cn> wrote:
Hi Stefan,

Thanks for the detailed explanation!

at 6:21 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:

> Hi Jinhao,
> Thanks for working on this!
>
> irqfd is not necessarily faster than KVM ioctl interrupt injection.
>
> There are at least two non performance reasons for irqfd:
> 1. It avoids QEMU emulation code, which historically was not thread safe and needed the Big QEMU Lock. IOThreads don't hold the BQL and therefore cannot safely call the regular interrupt emulation code in QEMU. I think this is still true today although parts of the code may now be less reliant on the BQL.

This probably means we need to move to irqfd when iothread support is added
in qemu-nvme.

Yes. You can audit the interrupt code but I'm pretty sure there is shared state that needs to be protected by the BQL. So the NVMe emulation code probably needs to use irqfd to avoid the interrupt emulation code.


> 2. The eventfd interface decouples interrupt injection from the KVM ioctl interface. Vhost kernel and vhost-user device emulation code has no dependency on KVM thanks to irqfd. They work with any eventfd, including irqfd.

This is contrary to our original belief. Klaus once pointed out that irqfd
is KVM specific. I agreed with him since I found irqfd implementation is in
virt/kvm/eventfd.c. But irqfd indeed avoids the KVM ioctl call. Could you
elaborate on what “no dependency on KVM” means?

"They work with any eventfd, including irqfd"

If you look at the vhost kernel or vhost-user code, you'll see they just signal the eventfd. It doesn't have to be an irqfd.

An irqfd is a specific type of eventfd that the KVM kernel module implements to inject interrupts when the eventfd is signaled.

By the way, this not only decouples vhost from the KVM kernel module, but also allows QEMU to emulate MSI-X masking via buffering the interrupt in userspace.


> 2. How can I debug this kind of cross QEMU-KVM problems?
>
> perf(1) is good at observing both kernel and userspace activity together. What is it that you want to debug.
>

I’ll look into perf(1). I think what I was trying to do is like a breakdown
analysis on which part caused the latency. For example, what is the root
cause of the performance improvements or regressions when irqfd is turned
on.

Nice, perf(1) is good for that. You can enable trace events and add kprobes/uprobes to record timestamps when specific functions are entered.

> What happens when the MSI-X vector is masked?
>
> I remember the VIRTIO code having masking support. I'm on my phone and can't check now, but I think it registers a temporary eventfd and buffers irqs while the vector is masked.

Yes, this RFC ignored interrupt masking support.

>
> This makes me wonder if the VIRTIO and NVMe IOThread irqfd code can be unified. Maybe IOThread support can be built into the core device emulation code (e.g. irq APIs) so that it's not necessary to duplicate it.
>

Agreed. Recently when working on ioeventfd, iothread and polling support, my
typical workflow is to look at how virtio does that and adjust that code
into nvme. I think unifying their IOThread code can be beneficial since
VIRTIO has incorporated many optimizations over the years that can not be
directly enjoyed by nvme. But I fear that subtle differences in the two
protocols may cause challenges for the unification.

Again, thanks for your help :)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]