[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v9] virtio-net: Fix network stall at the host side waiting fo
From: |
Michael S. Tsirkin |
Subject: |
Re: [PATCH v9] virtio-net: Fix network stall at the host side waiting for kick |
Date: |
Sat, 20 Jul 2024 14:57:48 -0400 |
On Mon, Jul 08, 2024 at 12:48:39PM +0800, Wencheng Yang wrote:
> From: thomas <east.moutain.yang@gmail.com>
>
> Patch 06b12970174 ("virtio-net: fix network stall under load")
> added double-check to test whether the available buffer size
> can satisfy the request or not, in case the guest has added
> some buffers to the avail ring simultaneously after the first
> check. It will be lucky if the available buffer size becomes
> okay after the double-check, then the host can send the packet
> to the guest. If the buffer size still can't satisfy the request,
> even if the guest has added some buffers, viritio-net would
> stall at the host side forever.
>
> The patch enables notification and checks whether the guest has
> added some buffers since last check of available buffers when
> the available buffers are insufficient. If no buffer is added,
> return false, else recheck the available buffers in the loop.
> If the available buffers are sufficient, disable notification
> and return true.
>
> Changes:
> 1. Change the return type of virtqueue_get_avail_bytes() from void
> to int, it returns an opaque that represents the shadow_avail_idx
> of the virtqueue on success, else -1 on error.
> 2. Add a new API: virtio_queue_enable_notification_and_check(),
> it takes an opaque as input arg which is returned from
> virtqueue_get_avail_bytes(). It enables notification firstly,
> then checks whether the guest has added some buffers since
> last check of available buffers or not by virtio_queue_poll(),
> return ture if yes.
>
> The patch also reverts patch "06b12970174".
>
> The case below can reproduce the stall.
>
> Guest 0
> +--------+
> | iperf |
> ---------------> | server |
> Host | +--------+
> +--------+ | ...
> | iperf |----
> | client |---- Guest n
> +--------+ | +--------+
> | | iperf |
> ---------------> | server |
> +--------+
>
> Boot many guests from qemu with virtio network:
> qemu ... -netdev tap,id=net_x \
> -device virtio-net-pci-non-transitional,\
> iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
>
> Each guest acts as iperf server with commands below:
> iperf3 -s -D -i 10 -p 8001
> iperf3 -s -D -i 10 -p 8002
>
> The host as iperf client:
> iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
> iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
>
> After some time, the host loses connection to the guest,
> the guest can send packet to the host, but can't receive
> packet from the host.
>
> It's more likely to happen if SWIOTLB is enabled in the guest,
> allocating and freeing bounce buffer takes some CPU ticks,
> copying from/to bounce buffer takes more CPU ticks, compared
> with that there is no bounce buffer in the guest.
> Once the rate of producing packets from the host approximates
> the rate of receiveing packets in the guest, the guest would
> loop in NAPI.
>
> receive packets ---
> | |
> v |
> free buf virtnet_poll
> | |
> v |
> add buf to avail ring ---
> |
> | need kick the host?
> | NAPI continues
> v
> receive packets ---
> | |
> v |
> free buf virtnet_poll
> | |
> v |
> add buf to avail ring ---
> |
> v
> ... ...
>
> On the other hand, the host fetches free buf from avail
> ring, if the buf in the avail ring is not enough, the
> host notifies the guest the event by writing the avail
> idx read from avail ring to the event idx of used ring,
> then the host goes to sleep, waiting for the kick signal
> from the guest.
>
> Once the guest finds the host is waiting for kick singal
> (in virtqueue_kick_prepare_split()), it kicks the host.
>
> The host may stall forever at the sequences below:
>
> Host Guest
> ------------ -----------
> fetch buf, send packet receive packet ---
> ... ... |
> fetch buf, send packet add buf |
> ... add buf virtnet_poll
> buf not enough avail idx-> add buf |
> read avail idx add buf |
> add buf ---
> receive packet ---
> write event idx ... |
> wait for kick add buf virtnet_poll
> ... |
> ---
> no more packet, exit NAPI
>
> In the first loop of NAPI above, indicated in the range of
> virtnet_poll above, the host is sending packets while the
> guest is receiving packets and adding buffers.
> step 1: The buf is not enough, for example, a big packet
> needs 5 buf, but the available buf count is 3.
> The host read current avail idx.
> step 2: The guest adds some buf, then checks whether the
> host is waiting for kick signal, not at this time.
> The used ring is not empty, the guest continues
> the second loop of NAPI.
> step 3: The host writes the avail idx read from avail
> ring to used ring as event idx via
> virtio_queue_set_notification(q->rx_vq, 1).
> step 4: At the end of the second loop of NAPI, recheck
> whether kick is needed, as the event idx in the
> used ring written by the host is beyound the
> range of kick condition, the guest will not
> send kick signal to the host.
>
> Fixes: 06b12970174 ("virtio-net: fix network stall under load")
> Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Jason, your tree I presume?
> ---
> Changelog:
> v9:
> - Add document
> - Pass opaque returned from virtqueue_get_avail_bytes() to
> virtio_queue_set_notification_and_check() as arg
>
> v8:
> - Change virtqueue_get_avail_bytes() return type from void
> to int, it returns shadow_avail_idx on success.
> - virtio_queue_set_notification_and_check() accepts two args,
> the second arg is the shadow idx retruned from
> virtqueue_get_avail_bytes()
> - Add function virtio_queue_poll(), it accepts shadow idx
> returned from virtqueue_get_avail_bytes() as the second
> arg, and tells whether guest had add some buffers since
> last check of available buffers.
>
> v7:
> - Add function virtio_queue_set_notification_and_check()
> - Restore the function sequence introduce in v6
>
> v6:
> - Take packed packed queue into cosideration
> - Adjust function sequence to fix compilation issue
>
> v5:
> - Modify return type of virtio_queue_set_notification() to
> bool to indicate whether the guest has added some buffers
> after last check of avail idx
> - Loop in virtio_net_has_buffers() if the available buffers
> are not sufficient and the guest has added some buffers.
> - Revert patch "06b12970174"
> - Update the subject
>
> v4:
> - Correct spelling mistake in the subject
> - Describe the issue that virtio-net is blocked at host side
>
> v3:
> - Add virtio-net tag in the subject
> - Refine commit log
>
> v2:
> - Add SOB tag at the end of the commit message
> - Place Fixes tag at the end of the commit message
>
> v1:
> - Initial patch
>
> hw/net/virtio-net.c | 28 ++++++++++-------
> hw/virtio/virtio.c | 64 +++++++++++++++++++++++++++++++++++---
> include/hw/virtio/virtio.h | 22 +++++++++++--
> 3 files changed, 95 insertions(+), 19 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9c7e85caea..b1a10e0548 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1641,24 +1641,28 @@ static bool virtio_net_can_receive(NetClientState *nc)
>
> static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
> {
> + int opaque;
> + unsigned int in_bytes;
> VirtIONet *n = q->n;
> - if (virtio_queue_empty(q->rx_vq) ||
> - (n->mergeable_rx_bufs &&
> - !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> - virtio_queue_set_notification(q->rx_vq, 1);
> -
> - /* To avoid a race condition where the guest has made some buffers
> - * available after the above check but before notification was
> - * enabled, check for available buffers again.
> - */
> - if (virtio_queue_empty(q->rx_vq) ||
> - (n->mergeable_rx_bufs &&
> - !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> +
> + while (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) {
> + opaque = virtqueue_get_avail_bytes(q->rx_vq, &in_bytes, NULL,
> + bufsize, 0);
> + /* buffer is enough, disable notifiaction */
> + if (bufsize <= in_bytes) {
> + break;
> + }
> +
> + if (virtio_queue_enable_notification_and_check(q->rx_vq, opaque)) {
> + /* guest has added some buffers, try again */
> + continue;
> + } else {
> return 0;
> }
> }
>
> virtio_queue_set_notification(q->rx_vq, 0);
> +
> return 1;
> }
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..8314de45dd 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -745,6 +745,60 @@ int virtio_queue_empty(VirtQueue *vq)
> }
> }
>
> +static bool virtio_queue_split_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> + if (unlikely(!vq->vring.avail)) {
> + return false;
> + }
> +
> + return (uint16_t)shadow_idx != vring_avail_idx(vq);
> +}
> +
> +static bool virtio_queue_packed_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> + VRingPackedDesc desc;
> + VRingMemoryRegionCaches *caches;
> +
> + if (unlikely(!vq->vring.desc)) {
> + return false;
> + }
> +
> + caches = vring_get_region_caches(vq);
> + if (!caches) {
> + return false;
> + }
> +
> + vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
> + shadow_idx, true);
> +
> + return is_desc_avail(desc.flags, vq->shadow_avail_wrap_counter);
> +}
> +
> +static bool virtio_queue_poll(VirtQueue *vq, unsigned shadow_idx)
> +{
> + if (virtio_device_disabled(vq->vdev)) {
> + return false;
> + }
> +
> + if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> + return virtio_queue_packed_poll(vq, shadow_idx);
> + } else {
> + return virtio_queue_split_poll(vq, shadow_idx);
> + }
> +}
> +
> +bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
> + int opaque)
> +{
> + virtio_queue_set_notification(vq, 1);
> +
> + if (opaque >= 0) {
> + return virtio_queue_poll(vq, (unsigned)opaque);
> + } else {
> + return false;
> + }
> +}
> +
> static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
> unsigned int len)
> {
> @@ -1332,9 +1386,9 @@ err:
> goto done;
> }
>
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> - unsigned int *out_bytes,
> - unsigned max_in_bytes, unsigned max_out_bytes)
> +int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> + unsigned int *out_bytes, unsigned max_in_bytes,
> + unsigned max_out_bytes)
> {
> uint16_t desc_size;
> VRingMemoryRegionCaches *caches;
> @@ -1367,7 +1421,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned
> int *in_bytes,
> caches);
> }
>
> - return;
> + return (int)vq->shadow_avail_idx;
> err:
> if (in_bytes) {
> *in_bytes = 0;
> @@ -1375,6 +1429,8 @@ err:
> if (out_bytes) {
> *out_bytes = 0;
> }
> +
> + return -1;
> }
>
> int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..ecbacdc2df 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -271,9 +271,14 @@ void qemu_put_virtqueue_element(VirtIODevice *vdev,
> QEMUFile *f,
> VirtQueueElement *elem);
> int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
> unsigned int out_bytes);
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> - unsigned int *out_bytes,
> - unsigned max_in_bytes, unsigned
> max_out_bytes);
> +/**
> + * return <0 on error or an opaque >=0 to pass to
> + * virtio_queue_enable_notification_and_check
> + * on asuccess
> + */
> +int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> + unsigned int *out_bytes, unsigned max_in_bytes,
> + unsigned max_out_bytes);
>
> void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
> void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
> @@ -307,6 +312,17 @@ int virtio_queue_ready(VirtQueue *vq);
>
> int virtio_queue_empty(VirtQueue *vq);
>
> +/**
> + * enable notification and check whether guest has added some
> + * buffers since last call to virtqueue_get_avail_bytes.
> + *
> + * @opaque: value returned from virtqueue_get_avail_bytes
> + */
> +bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
> + int opaque);
> +
> +void virtio_queue_set_shadow_avail_idx(VirtQueue *vq, uint16_t idx);
> +
> /* Host binding interface. */
>
> uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);
> --
> 2.39.0