qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tap: introduce IFF_NAPI


From: Jon Kohler
Subject: Re: [PATCH] tap: introduce IFF_NAPI
Date: Wed, 31 May 2023 13:46:01 +0000


> On May 31, 2023, at 1:27 AM, Jason Wang <jasowang@redhat.com> wrote:
> 
> On Wed, May 31, 2023 at 11:55 AM Jason Wang <jasowang@redhat.com> wrote:
>> 
>> On Wed, May 31, 2023 at 11:47 AM Jon Kohler <jon@nutanix.com> wrote:
>>> 
>>> 
>>> 
>>>> On May 30, 2023, at 11:35 PM, Jason Wang <jasowang@redhat.com> wrote:
>>>> 
>>>> On Wed, May 31, 2023 at 11:32 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>> 
>>>>> On Wed, May 31, 2023 at 11:17 AM Jon Kohler <jon@nutanix.com> wrote:
>>>>>> 
>>>>>> If kernel supports IFF_NAPI, lets use it, which is especially useful
>>>>>> on kernels containing fb3f903769e8 ("tun: support NAPI for packets
>>>>>> received from batched XDP buffs"), as IFF_NAPI allows the
>>>>>> vhost_tx_batch path to use NAPI on XDP buffs.
>>>>>> 
>>>>>> Benchmark w/ iperf -c (remote srv) -P (thread count) -l (stream size)
>>>>>> from a guest running kernel 5.10.105 to remote bare metal running
>>>>>> patched code on kernel 5.10.139. Guests were configured 1x virtio-net
>>>>>> device with 4x queues, resulting in 4x vhost-worker threads. Hosts are
>>>>>> identical with Intel ICX 4314 @ 2.4 GHz with Mellanox CX5 25GbE NIC ->
>>>>>> Arista 25GbE switch. vhost-worker threads largely maxed out on CPU on
>>>>>> "Before" and around ~50-60% utilization "After".
>>>>>> 
>>>>>> Single Stream: iperf -P 1
>>>>>> iperf -l size | Before          | After          | Increase
>>>>>> 64B           | 593 Mbits/sec   | 712 Mbits/sec  | ~20%
>>>>>> 128B          | 1.00 Gbits/sec  | 1.18 Gbits/sec | ~18%
>>>>>> 4KB           | 17.6 Gbits/sec  | 22.7 Gbits/sec | ~29%
>>>>>> 
>>>>>> Multiple Stream: iperf -P 12
>>>>>> iperf -l size | Before          | After          | Increase
>>>>>> 64B           | 6.35 Gbits/sec  | 7.78 Gbits/sec | ~23%
>>>>>> 128B          | 10.8 Gbits/sec  | 14.2 Gbits/sec | ~31%
>>>>>> 4KB           | 23.6 Gbits/sec  | 23.6 Gbits/sec | (line speed)
>>>>>> 
>>>>>> Signed-off-by: Jon Kohler <jon@nutanix.com>
>>>>> 
>>>>> Great, but I would suggest having an option.
>>>>> 
>>>>> So we can:
>>>>> 
>>>>> 1) ease the debug and compare
>>>>> 2) enable this by default only for 8.1, disable it for pre 8.1
>>> 
>>> Fair enough, one favor to ask though -
>>> Would you be able to point me to an existing option like what you’re
>>> proposing so I could make sure I’m on the same page?
>> 
>> For example, the vhost option for tap. Maybe we can have an napi option.

OK thanks, I’ll see what I can do there.

>> 
>>> 
>>>> 
>>>> More thought, if the performance boost only after fb3f903769e8, we
>>>> probably need to disable it by default and let the mgmt layer to
>>>> enable it.
>>>> 
>>> 
>>> I focused my testing with that commit, but I could take it out and
>>> we still should get benefit. Would you like me to profile that to validate?
>>> 
>> 
>> One problem is that NAPI for TAP was originally used for kernel
>> hardening. Looking at the history, it introduces a lot of bugs.
>> 
>> Consider:
>> 
>> 1) it has been merged for several years
>> 2) tap has been widely used for a long time as well
>> 
>> I think it would be still safe to keep the option off (at least for
>> pre 8.1 machines).
>> 
>>> Asking as NAPI support in tun.c has been there for a while, guessing
>>> at first glance that there would be non-zero gains, with little downsides.
>>> Looking at git blame, seems about ~5-6 years of support.
>> 
>> Yes.
>> 
>>> 
>>> Also for posterity, that commit has been in since 5.18, so a little over 1 
>>> year.
>> 
>> Then I think we can make it enabled by default for 8.1 and see.

OK, I’ll see what I can come up with.

> 
> Btw, it would be better if we can have some PPS benchmark as well.
> 
> Thanks

Is there a set of specific benchmark(s) that you want to see? Certain packet
sizes? TCP/UDP? Certain tool (netperf, iperf, etc)? The existing benchmarks
in the commit msg have both single and multi stream and multiple payload
sizes, all of which are a corollary to PPS, no?

Happy to do more profiling, just want to make sure I get you exactly what you
want.

> 
>> 
>> Thanks
>> 
>>> 
>>>> Thanks
>>>> 
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Thanks
>>>>> 
>>>>>> ---
>>>>>> net/tap-linux.c | 4 ++++
>>>>>> net/tap-linux.h | 1 +
>>>>>> 2 files changed, 5 insertions(+)
>>>>>> 
>>>>>> diff --git a/net/tap-linux.c b/net/tap-linux.c
>>>>>> index f54f308d359..fd94df166e0 100644
>>>>>> --- a/net/tap-linux.c
>>>>>> +++ b/net/tap-linux.c
>>>>>> @@ -62,6 +62,10 @@ int tap_open(char *ifname, int ifname_size, int 
>>>>>> *vnet_hdr,
>>>>>>        ifr.ifr_flags |= IFF_ONE_QUEUE;
>>>>>>    }
>>>>>> 
>>>>>> +    if (features & IFF_NAPI) {
>>>>>> +        ifr.ifr_flags |= IFF_NAPI;
>>>>>> +    }
>>>>>> +
>>>>>>    if (*vnet_hdr) {
>>>>>>        if (features & IFF_VNET_HDR) {
>>>>>>            *vnet_hdr = 1;
>>>>>> diff --git a/net/tap-linux.h b/net/tap-linux.h
>>>>>> index bbbb62c2a75..f4d8e55270b 100644
>>>>>> --- a/net/tap-linux.h
>>>>>> +++ b/net/tap-linux.h
>>>>>> @@ -37,6 +37,7 @@
>>>>>> 
>>>>>> /* TUNSETIFF ifr flags */
>>>>>> #define IFF_TAP          0x0002
>>>>>> +#define IFF_NAPI         0x0010
>>>>>> #define IFF_NO_PI        0x1000
>>>>>> #define IFF_ONE_QUEUE    0x2000
>>>>>> #define IFF_VNET_HDR     0x4000
>>>>>> --
>>>>>> 2.30.1 (Apple Git-130)
>>> 
> 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]