[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
From: |
Stefan Hajnoczi |
Subject: |
Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled) |
Date: |
Thu, 19 Dec 2019 14:58:17 +0000 |
User-agent: |
Mutt/1.12.1 (2019-06-15) |
On Wed, Nov 27, 2019 at 03:39:03PM +0300, ASM wrote:
> When the packet is received, e1000 writes it to memory directrly
> without any RCU.
> The address of memory for writing is set by the driver from dpdk driver.
> Driver writes to RDBA (RDBAH,RDBAL) base address of ring.
>
> It turns out that MMIO RCU (mentioned from e1000_mmio_setup) does not
> protect, and can't protect the ring descriptors.
> The area for protection may be any area of operational memory. And it
> becomes famous when writing to registers RDBA by driver.
> (see datasheet 82574 GbE Controller "7.1.8 Receive Descriptor Queue
> Structure")
>
> How can this memory be protected? As I understand it, the e1000 should
> track the record in RDBA and enable memory protection in this region.
> But how to do it right?
I misunderstood the issue and you can probably ignore my comments about
coalesced MMIO. You quoted descriptor DMA code below so coalesced MMIO
shouldn't be relevant since desc->status isn't an MMIO register.
>
> Source code qemu:
> hw/net/e1000.c:954 (version master)
>
> 954 base = rx_desc_base(s) + sizeof(desc) * s->mac_reg[RDH];
> where rx_desc_base -- address RDBAH regs. It address no have RCU protect.
> ...
> 955 pci_dma_read(d, base, &desc, sizeof(desc));
> ...
> 957 desc.status |= (vlan_status | E1000_RXD_STAT_DD);
> ...
> 990 pci_dma_write(d, base, &desc, sizeof(desc));
> ->
> exec.c:
> 3111 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
> 3112 MemTxAttrs attrs,
> 3113 const uint8_t *buf,
> 3114 hwaddr len, hwaddr addr1,
> 3115 hwaddr l, MemoryRegion *mr)
> 3116 {
> ...
> 3123 if (!memory_access_is_direct(mr, true)) {
> (false)
> 3131 } else {
> 3132 /* RAM case */
> 3133 ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l, false);
> 3134 memcpy(ptr, buf, l);
>
> where I be seeing weird behavior with KVM due to MMIO write coalescing
>
> 3135 invalidate_and_set_dirty(mr, addr1, l);
> 3136 }
> 3137
>
> Source code dpdk(e1000): (version dpdk-stable-17.11.9)
> drivers/net/e1000/em_rxtx.c:
>
> 699 uint16_t
> 700 eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> 701 uint16_t nb_pkts)
> ...
> 718 rxq = rx_queue
> ...
> 722 rx_id = rxq->rx_tail;
> 723 rx_ring = rxq->rx_ring
> ...
> 734 rxdp = &rx_ring[rx_id];
> 735 status = rxdp->status;
> 736 if (! (status & E1000_RXD_STAT_DD))
> 737 break;
> ...
> 807 rxdp->buffer_addr = dma_addr;
> 808 rxdp->status = 0;
> where I be seeing weird behavior with KVM due to MMIO write
> coalescing
It could be a bug in QEMU's e1000 emulation - maybe it's not doing
things in the correct order and causes a race condition with the DPDK
polling driver - or it could be a bug in the DPDK e1000 driver regarding
the order in which the descriptor ring and RX Head/Tail MMIO registers
are updated.
Did you find the root cause?
> P.S.
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
> All tests on single virtual CPU.
>
> I created github project for quick reproduction of this error:
> https://github.com/BASM/qemu_dpdk_e1000_test
>
> ---
> Best regards,
> Leonid Myravjev
>
> On Thu, 21 Nov 2019 at 17:05, Stefan Hajnoczi <address@hidden> wrote:
> >
> > On Wed, Nov 20, 2019 at 08:36:32PM +0300, ASM wrote:
> > > I trying solve the problem, with packets stopping (e1000,tap,kvm).
> > > My studies led to the following:
> > > 1. From flatview_write_continue() I see, what e1000 writes the number
> > > "7" to the STAT register.
> > > 2. The driver from target OS reads STAT register with number "7" and
> > > writes to the register the number "0".
> > > 3. From flatview_write_continue() (I make edits):
> > > memcpy(ptr, buf, l);
> > > new1=ptr[0xc];
> > > usleep(100);
> > > new2=ptr[0xc];
> > > invalidate_and_set_dirty(mr, addr1, l);
> > > new3=ptr[0xc];
> > > printf("Old: %i, new1, %i, new2: %i, new3: %i\n", old,new1,new2,new3);
> > >
> > > I see what memory in first printf is "7", but after usleep() is "0".
> > > Do I understand correctly that this should not be? Or RCU lock
> > > suggests the ability to the multiple writers?
> > >
> > > The problem is that qemu(e1000) writes the number 7, after which
> > > target(dpdk driver) reads 7, on the basis of this it writes the number
> > > 0, but as a result (extremely rarely), the value STATUS still remains
> > > 7. Therefore, packet processing is interrupted. This behavior is
> > > observed only on kvm (it is not observed on tcg).
> > >
> > > Please help with advice or ideas.
> >
> > Hi Leonid,
> > Could you be seeing weird behavior with KVM due to MMIO write
> > coalescing?
> >
> > static void e1000_mmio_setup(E1000State *d)
> > {
> > int i;
> > const uint32_t excluded_regs[] = {
> > E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
> > E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
> > };
> >
> > memory_region_init_io(&d->mmio, OBJECT(d), &e1000_mmio_ops, d,
> > "e1000-mmio", PNPMMIO_SIZE);
> > memory_region_add_coalescing(&d->mmio, 0, excluded_regs[0]);
> > for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
> > memory_region_add_coalescing(&d->mmio, excluded_regs[i] + 4,
> > excluded_regs[i+1] -
> > excluded_regs[i] - 4);
> > memory_region_init_io(&d->io, OBJECT(d), &e1000_io_ops, d,
> > "e1000-io", IOPORT_SIZE);
> > }
> >
> > MMIO write coalescing means that QEMU doesn't see the register writes
> > immediately. Instead kvm.ko records them into a ring buffer and QEMU
> > processes the ring when the next ioctl(KVM_RUN) exit occurs.
> >
> > See Linux Documentation/virt/kvm/api.txt "4.116
> > KVM_(UN)REGISTER_COALESCED_MMIO" for more details.
> >
> > I don't really understand your printf debugging explanation. It would
> > help to see the DPDK code and the exact printf() output.
> >
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
> >
> > Stefan
signature.asc
Description: PGP signature
- Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled),
Stefan Hajnoczi <=