[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tc
From: |
Jason Wang |
Subject: |
Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic |
Date: |
Fri, 18 Mar 2016 10:03:04 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 |
On 03/18/2016 12:45 AM, Wei Xu wrote:
> On 2016年03月17日 16:42, Jason Wang wrote:
>
>>
>> On 03/15/2016 05:17 PM, address@hidden wrote:
>>> From: Wei Xu <address@hidden>
>>>
>>> All the data packets in a tcp connection will be cached to a big buffer
>>> in every receive interval, and will be sent out via a timer, the
>>> 'virtio_net_rsc_timeout' controls the interval, the value will
>>> influent the
>>> performance and response of tcp connection extremely, 50000(50us) is a
>>> experience value to gain a performance improvement, since the whql test
>>> sends packets every 100us, so '300000(300us)' can pass the test case,
>>> this is also the default value, it's gonna to be tunable.
>>>
>>> The timer will only be triggered if the packets pool is not empty,
>>> and it'll drain off all the cached packets
>>>
>>> 'NetRscChain' is used to save the segments of different protocols in a
>>> VirtIONet device.
>>>
>>> The main handler of TCP includes TCP window update, duplicated ACK
>>> check
>>> and the real data coalescing if the new segment passed sanity check
>>> and is identified as an 'wanted' one.
>>>
>>> An 'wanted' segment means:
>>> 1. Segment is within current window and the sequence is the expected
>>> one.
>>> 2. ACK of the segment is in the valid window.
>>> 3. If the ACK in the segment is a duplicated one, then it must less
>>> than 2,
>>> this is to notify upper layer TCP starting retransmission due to
>>> the spec.
>>>
>>> Sanity check includes:
>>> 1. Incorrect version in IP header
>>> 2. IP options & IP fragment
>>> 3. Not a TCP packets
>>> 4. Sanity size check to prevent buffer overflow attack.
>>>
>>> There maybe more cases should be considered such as ip
>>> identification other
>>> flags, while it broke the test because windows set it to the same
>>> even it's
>>> not a fragment.
>>>
>>> Normally it includes 2 typical ways to handle a TCP control flag,
>>> 'bypass'
>>> and 'finalize', 'bypass' means should be sent out directly, and
>>> 'finalize'
>>> means the packets should also be bypassed, and this should be done
>>> after searching for the same connection packets in the pool and sending
>>> all of them out, this is to avoid out of data.
>>>
>>> All the 'SYN' packets will be bypassed since this always begin a new'
>>> connection, other flags such 'FIN/RST' will trigger a finalization,
>>> because
>>> this normally happens upon a connection is going to be closed, an
>>> 'URG' packet
>>> also finalize current coalescing unit while there maybe protocol
>>> difference to
>>> different OS.
>> But URG packet should be sent as quickly as possible regardless of
>> ordering, no?
>
> Yes, you right, URG will terminate the current 'SCU', i'll amend the
> commit log.
>
>>
>>> Statistics can be used to monitor the basic coalescing status, the
>>> 'out of order'
>>> and 'out of window' means how many retransmitting packets, thus
>>> describe the
>>> performance intuitively.
>>>
>>> Signed-off-by: Wei Xu <address@hidden>
>>> ---
>>> hw/net/virtio-net.c | 486
>>> ++++++++++++++++++++++++++++++++++++++++-
>>> include/hw/virtio/virtio-net.h | 1 +
>>> include/hw/virtio/virtio.h | 72 ++++++
>>> 3 files changed, 558 insertions(+), 1 deletion(-)
[...]
>>> + } else {
>>> + /* Coalesce window update */
>>> + o_tcp->th_win = n_tcp->th_win;
>>> + chain->stat.win_update++;
>>> + return RSC_COALESCE;
>>> + }
>>> + } else {
>>> + /* pure ack, update ack */
>>> + o_tcp->th_ack = n_tcp->th_ack;
>>> + chain->stat.pure_ack++;
>>> + return RSC_COALESCE;
>> Looks like there're something I missed. The spec said:
>>
>> "In other words, any pure ACK that is not a duplicate ACK or a window
>> update triggers an exception and must not be coalesced. All such pure
>> ACKs must be indicated as individual segments."
>>
>> Does it mean we *should not* coalesce windows update and pure ack?
>> (Since it can wakeup transmission)?
>
> It's also a little bit inexplicit and flexible due to the spec, please
> see the flowchart I on the same page.
>
> Comments about the flowchart:
> ------------------------------------------------------------------------
> The first of the following two flowcharts describes the rules for
> coalescing segments and updating the TCP headers.
> This flowchart refers to mechanisms for distinguishing valid duplicate
> ACKs and window updates. The second flowchart describes these mechanisms.
> ------------------------------------------------------------------------
> As show in the flowchart, only status 'C' will break current scu and
> get finalized, both 'A' and 'B' can be coalesced afaik.
>
Interesting, looks like you're right.
>>
>>> + }
>>> +}
>>> +
>>> +static int32_t virtio_net_rsc_coalesce_data(NetRscChain *chain,
>>> NetRscSeg *seg,
>>> + const uint8_t *buf, NetRscUnit
>>> *n_unit)
>>> +{
>>> + void *data;
>>> + uint16_t o_ip_len;
>>> + uint32_t nseq, oseq;
>>> + NetRscUnit *o_unit;
>>> +
>>> + o_unit = &seg->unit;
>>> + o_ip_len = htons(*o_unit->ip_plen);
>>> + nseq = htonl(n_unit->tcp->th_seq);
>>> + oseq = htonl(o_unit->tcp->th_seq);
>>> +
>>> + if (n_unit->tcp_hdrlen > TCP_HDR_SZ) {
>>> + /* Log this only for debugging observation */
>>> + chain->stat.tcp_option++;
>>> + }
>>> +
>>> + /* Ignore packet with more/larger tcp options */
>>> + if (n_unit->tcp_hdrlen > o_unit->tcp_hdrlen) {
>> What if n_unit->tcp_hdrlen > o_uint->tcp_hdr_len ?
> do you mean '<'? that also means some option changed, maybe i should
> include it.
Yes.
>>
>>> + chain->stat.tcp_larger_option++;
>>> + return RSC_FINAL;
>>> + }
>>> +
>>> + /* out of order or retransmitted. */
>>> + if ((nseq - oseq) > MAX_TCP_PAYLOAD) {
>>> + chain->stat.data_out_of_win++;
>>> + return RSC_FINAL;
>>> + }
>>> +
>>> + data = ((uint8_t *)n_unit->tcp) + n_unit->tcp_hdrlen;
>>> + if (nseq == oseq) {
>>> + if ((0 == o_unit->payload) && n_unit->payload) {
>>> + /* From no payload to payload, normal case, not a dup
>>> ack or etc */
>>> + chain->stat.data_after_pure_ack++;
>>> + goto coalesce;
>>> + } else {
>>> + return virtio_net_rsc_handle_ack(chain, seg, buf,
>>> + n_unit->tcp,
>>> o_unit->tcp);
>>> + }
>>> + } else if ((nseq - oseq) != o_unit->payload) {
>>> + /* Not a consistent packet, out of order */
>>> + chain->stat.data_out_of_order++;
>>> + return RSC_FINAL;
>>> + } else {
>>> +coalesce:
>>> + if ((o_ip_len + n_unit->payload) > chain->max_payload) {
>>> + chain->stat.over_size++;
>>> + return RSC_FINAL;
>>> + }
>>> +
>>> + /* Here comes the right data, the payload lengh in v4/v6 is
>>> different,
>>> + so use the field value to update and record the new data
>>> len */
>>> + o_unit->payload += n_unit->payload; /* update new data len */
>>> +
>>> + /* update field in ip header */
>>> + *o_unit->ip_plen = htons(o_ip_len + n_unit->payload);
>>> +
>>> + /* Bring 'PUSH' big, the whql test guide says 'PUSH' can be
>>> coalesced
>>> + for windows guest, while this may change the behavior
>>> for linux
>>> + guest. */
>> This needs more thought, 'can' probably means don't. (Linux GRO won't
>> merge PUSH packet).
> Yes, since it's mainly for win guest, how about take this as this
> first and do more test and see how to handle it?
Right, this is not an issue probably but just an optimization for latency.
[...]
>>
>>> + return NULL;
>>> + }
>>> +
>>> + QTAILQ_FOREACH(chain, &n->rsc_chains, next) {
>>> + if (chain->proto == proto) {
>>> + return chain;
>>> + }
>>> + }
>>> +
>>> + chain = g_malloc(sizeof(*chain));
>>> + chain->hdr_size = n->guest_hdr_len;
>> Why introduce a specified field for instead of just use
>> n->guest_hdr_len?
> this is to reduce invoking times to find VirtIONet by 'nc', there are
> a few places will use it.
Okay, then I think you'd better keep a pointer to VirtIONet structure
instead. (Consider you may want to refer more data from it, we don't
want to duplicate fields in two places).
>>
>>> + chain->proto = proto;
>>> + chain->max_payload = MAX_IP4_PAYLOAD;
>>> + chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>>> + virtio_net_rsc_purge, chain);
>>> + memset(&chain->stat, 0, sizeof(chain->stat));
>>> +
>>> + QTAILQ_INIT(&chain->buffers);
>>> + QTAILQ_INSERT_TAIL(&n->rsc_chains, chain, next);
>>> +
>>> + return chain;
>>> +}
>>> +
>>> +static ssize_t virtio_net_rsc_receive(NetClientState *nc,
>>> + const uint8_t *buf, size_t size)
>>> +{
>>> + uint16_t proto;
>>> + NetRscChain *chain;
>>> + struct eth_header *eth;
>>> + VirtIONet *n;
>>> +
>>> + n = qemu_get_nic_opaque(nc);
>>> + if (size < (n->guest_hdr_len + ETH_HDR_SZ)) {
>>> + return virtio_net_do_receive(nc, buf, size);
>>> + }
>>> +
>>> + eth = (struct eth_header *)(buf + n->guest_hdr_len);
>>> + proto = htons(eth->h_proto);
>>> +
>>> + chain = virtio_net_rsc_lookup_chain(n, nc, proto);
>>> + if (!chain) {
>>> + return virtio_net_do_receive(nc, buf, size);
>>> + } else {
>>> + chain->stat.received++;
>>> + return virtio_net_rsc_receive4(chain, nc, buf, size);
>>> + }
>>> +}
>>> +
>>> +static ssize_t virtio_net_receive(NetClientState *nc,
>>> + const uint8_t *buf, size_t size)
>>> +{
>>> + if (virtio_net_rsc_bypass) {
>>> + return virtio_net_do_receive(nc, buf, size);
>> You need a feature bit for this and compat it for older machine types.
>> And also need some work on virtio spec I think.
> yes, not sure which way is good to support this, hmp/qmp/ethtool, this
> is gonna to support win guest,
> so need a well-compatible interface, any comments?
I think this should be implemented through feature bits/negotiation
instead of something like ethtool.
[...]
- [Qemu-devel] [ Patch 0/2] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest, wexu, 2016/03/15
- [Qemu-devel] [ Patch 2/2] virtio-net rsc: support coalescing ipv6 tcp traffic, wexu, 2016/03/15
- [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic, wexu, 2016/03/15
- Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic, Michael S. Tsirkin, 2016/03/15
- Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic, Jason Wang, 2016/03/17
- Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic, Wei Xu, 2016/03/17
- Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic,
Jason Wang <=
- Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic, Wei Xu, 2016/03/18
- Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic, Jason Wang, 2016/03/18
- Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic, Wei Xu, 2016/03/18
- Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic, Jason Wang, 2016/03/18
- Re: [Qemu-devel] [ Patch 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic, Wei Xu, 2016/03/18
Re: [Qemu-devel] [ Patch 0/2] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest, Michael S. Tsirkin, 2016/03/15
Re: [Qemu-devel] [ Patch 0/2] Support Receive-Segment-Offload(RSC) for WHQL test of Window guest, Jason Wang, 2016/03/17