[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
From: |
ASM |
Subject: |
Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled) |
Date: |
Wed, 27 Nov 2019 15:39:03 +0300 |
Stefan, thanks for answering.
When the packet is received, e1000 writes it to memory directrly
without any RCU.
The address of memory for writing is set by the driver from dpdk driver.
Driver writes to RDBA (RDBAH,RDBAL) base address of ring.
It turns out that MMIO RCU (mentioned from e1000_mmio_setup) does not
protect, and can't protect the ring descriptors.
The area for protection may be any area of operational memory. And it
becomes famous when writing to registers RDBA by driver.
(see datasheet 82574 GbE Controller "7.1.8 Receive Descriptor Queue Structure")
How can this memory be protected? As I understand it, the e1000 should
track the record in RDBA and enable memory protection in this region.
But how to do it right?
Source code qemu:
hw/net/e1000.c:954 (version master)
954 base = rx_desc_base(s) + sizeof(desc) * s->mac_reg[RDH];
where rx_desc_base -- address RDBAH regs. It address no have RCU protect.
...
955 pci_dma_read(d, base, &desc, sizeof(desc));
...
957 desc.status |= (vlan_status | E1000_RXD_STAT_DD);
...
990 pci_dma_write(d, base, &desc, sizeof(desc));
->
exec.c:
3111 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
3112 MemTxAttrs attrs,
3113 const uint8_t *buf,
3114 hwaddr len, hwaddr addr1,
3115 hwaddr l, MemoryRegion *mr)
3116 {
...
3123 if (!memory_access_is_direct(mr, true)) {
(false)
3131 } else {
3132 /* RAM case */
3133 ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l, false);
3134 memcpy(ptr, buf, l);
where I be seeing weird behavior with KVM due to MMIO write coalescing
3135 invalidate_and_set_dirty(mr, addr1, l);
3136 }
3137
Source code dpdk(e1000): (version dpdk-stable-17.11.9)
drivers/net/e1000/em_rxtx.c:
699 uint16_t
700 eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
701 uint16_t nb_pkts)
...
718 rxq = rx_queue
...
722 rx_id = rxq->rx_tail;
723 rx_ring = rxq->rx_ring
...
734 rxdp = &rx_ring[rx_id];
735 status = rxdp->status;
736 if (! (status & E1000_RXD_STAT_DD))
737 break;
...
807 rxdp->buffer_addr = dma_addr;
808 rxdp->status = 0;
where I be seeing weird behavior with KVM due to MMIO write
coalescing
P.S.
> Also, is DPDK accessing the e1000 device from more than 1 vCPU?
All tests on single virtual CPU.
I created github project for quick reproduction of this error:
https://github.com/BASM/qemu_dpdk_e1000_test
---
Best regards,
Leonid Myravjev
On Thu, 21 Nov 2019 at 17:05, Stefan Hajnoczi <address@hidden> wrote:
>
> On Wed, Nov 20, 2019 at 08:36:32PM +0300, ASM wrote:
> > I trying solve the problem, with packets stopping (e1000,tap,kvm).
> > My studies led to the following:
> > 1. From flatview_write_continue() I see, what e1000 writes the number
> > "7" to the STAT register.
> > 2. The driver from target OS reads STAT register with number "7" and
> > writes to the register the number "0".
> > 3. From flatview_write_continue() (I make edits):
> > memcpy(ptr, buf, l);
> > new1=ptr[0xc];
> > usleep(100);
> > new2=ptr[0xc];
> > invalidate_and_set_dirty(mr, addr1, l);
> > new3=ptr[0xc];
> > printf("Old: %i, new1, %i, new2: %i, new3: %i\n", old,new1,new2,new3);
> >
> > I see what memory in first printf is "7", but after usleep() is "0".
> > Do I understand correctly that this should not be? Or RCU lock
> > suggests the ability to the multiple writers?
> >
> > The problem is that qemu(e1000) writes the number 7, after which
> > target(dpdk driver) reads 7, on the basis of this it writes the number
> > 0, but as a result (extremely rarely), the value STATUS still remains
> > 7. Therefore, packet processing is interrupted. This behavior is
> > observed only on kvm (it is not observed on tcg).
> >
> > Please help with advice or ideas.
>
> Hi Leonid,
> Could you be seeing weird behavior with KVM due to MMIO write
> coalescing?
>
> static void e1000_mmio_setup(E1000State *d)
> {
> int i;
> const uint32_t excluded_regs[] = {
> E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
> E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
> };
>
> memory_region_init_io(&d->mmio, OBJECT(d), &e1000_mmio_ops, d,
> "e1000-mmio", PNPMMIO_SIZE);
> memory_region_add_coalescing(&d->mmio, 0, excluded_regs[0]);
> for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
> memory_region_add_coalescing(&d->mmio, excluded_regs[i] + 4,
> excluded_regs[i+1] - excluded_regs[i]
> - 4);
> memory_region_init_io(&d->io, OBJECT(d), &e1000_io_ops, d, "e1000-io",
> IOPORT_SIZE);
> }
>
> MMIO write coalescing means that QEMU doesn't see the register writes
> immediately. Instead kvm.ko records them into a ring buffer and QEMU
> processes the ring when the next ioctl(KVM_RUN) exit occurs.
>
> See Linux Documentation/virt/kvm/api.txt "4.116
> KVM_(UN)REGISTER_COALESCED_MMIO" for more details.
>
> I don't really understand your printf debugging explanation. It would
> help to see the DPDK code and the exact printf() output.
>
> Also, is DPDK accessing the e1000 device from more than 1 vCPU?
>
> Stefan