qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] hw/nvme: fix handling of over-committed queues


From: Klaus Jensen
Subject: Re: [PATCH v2] hw/nvme: fix handling of over-committed queues
Date: Mon, 4 Nov 2024 19:15:37 +0100

On Oct 29 13:15, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> If a host chooses to use the SQHD "hint" in the CQE to know if there is
> room in the submission queue for additional commands, it may result in a
> situation where there are not enough internal resources (struct
> NvmeRequest) available to process the command. For a lack of a better
> term, the host may "over-commit" the device (i.e., it may have more
> inflight commands than the queue size).
> 
> For example, assume a queue with N entries. The host submits N commands
> and all are picked up for processing, advancing the head and emptying
> the queue. Regardless of which of these N commands complete first, the
> SQHD field of that CQE will indicate to the host that the queue is
> empty, which allows the host to issue N commands again. However, if the
> device has not posted CQEs for all the previous commands yet, the device
> will have less than N resources available to process the commands, so
> queue processing is suspended.
> 
> And here lies an 11 year latent bug. In the absense of any additional
> tail updates on the submission queue, we never schedule the processing
> bottom-half again unless we observe a head update on an associated full
> completion queue. This has been sufficient to handle N-to-1 SQ/CQ setups
> (in the absense of over-commit of course). Incidentially, that "kick all
> associated SQs" mechanism can now be killed since we now just schedule
> queue processing when we return a processing resource to a non-empty
> submission queue, which happens to cover both edge cases. However, we
> must retain kicking the CQ if it was previously full.
> 
> So, apparently, no previous driver tested with hw/nvme has ever used
> SQHD (e.g., neither the Linux NVMe driver or SPDK uses it). But then OSv
> shows up with the driver that actually does. I salute you.
> 
> Fixes: f3c507adcd7b ("NVMe: Initial commit for new storage interface")
> Cc: qemu-stable@nongnu.org
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2388
> Reported-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
> Changes in v2:
> - Retain cq kick on previously full queue
> - Link to v1: 
> https://lore.kernel.org/r/20241025-issue-2388-v1-1-16707e0d3342@samsung.com
> ---
>  hw/nvme/ctrl.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 
> f4e89203c1a6e3b051fd7185cbf01ec9bae9684a..1185455a94c4af43a39708b1b97dba9624fc7ad3
>  100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -1520,9 +1520,16 @@ static void nvme_post_cqes(void *opaque)
>              stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
>              break;
>          }
> +
>          QTAILQ_REMOVE(&cq->req_list, req, entry);
> +
>          nvme_inc_cq_tail(cq);
>          nvme_sg_unmap(&req->sg);
> +
> +        if (QTAILQ_EMPTY(&sq->req_list) && !nvme_sq_empty(sq)) {
> +            qemu_bh_schedule(sq->bh);
> +        }
> +
>          QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
>      }
>      if (cq->tail != cq->head) {
> @@ -7950,7 +7957,6 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, 
> int val)
>          /* Completion queue doorbell write */
>  
>          uint16_t new_head = val & 0xffff;
> -        int start_sqs;
>          NvmeCQueue *cq;
>  
>          qid = (addr - (0x1000 + (1 << 2))) >> 3;
> @@ -8001,19 +8007,16 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, 
> int val)
>  
>          trace_pci_nvme_mmio_doorbell_cq(cq->cqid, new_head);
>  
> -        start_sqs = nvme_cq_full(cq) ? 1 : 0;
> -        cq->head = new_head;
> -        if (!qid && n->dbbuf_enabled) {
> -            stl_le_pci_dma(pci, cq->db_addr, cq->head, 
> MEMTXATTRS_UNSPECIFIED);
> -        }
> -        if (start_sqs) {
> -            NvmeSQueue *sq;
> -            QTAILQ_FOREACH(sq, &cq->sq_list, entry) {
> -                qemu_bh_schedule(sq->bh);
> -            }
> +        /* scheduled deferred cqe posting if queue was previously full */
> +        if (nvme_cq_full(cq)) {
>              qemu_bh_schedule(cq->bh);
>          }
>  
> +        cq->head = new_head;
> +        if (!qid && n->dbbuf_enabled) {
> +            stl_le_pci_dma(pci, cq->db_addr, cq->head, 
> MEMTXATTRS_UNSPECIFIED);
> +        }
> +
>          if (cq->tail == cq->head) {
>              if (cq->irq_enabled) {
>                  n->cq_pending--;
> 
> ---
> base-commit: fdf250e5a37830615e324017cb3a503e84b3712c
> change-id: 20241025-issue-2388-bd047487f74c
> 
> Best regards,
> -- 
> Klaus Jensen <k.jensen@samsung.com>
> 

Ping. Tested, but would appreciate a review ;)

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]