Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)

From:	Stefan Hajnoczi
Subject:	Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
Date:	Thu, 19 Dec 2019 14:58:17 +0000
User-agent:	Mutt/1.12.1 (2019-06-15)

On Wed, Nov 27, 2019 at 03:39:03PM +0300, ASM wrote:
> When the packet is received, e1000 writes it to memory directrly
> without any RCU.
> The address of memory for writing is set by the driver from dpdk driver.
> Driver writes to RDBA (RDBAH,RDBAL) base address of ring.
> 
> It turns out that MMIO RCU (mentioned from e1000_mmio_setup) does not
> protect, and can't protect the ring descriptors.
> The area for protection may be any area of operational memory. And it
> becomes famous when writing to registers RDBA by driver.
> (see datasheet 82574 GbE Controller "7.1.8 Receive Descriptor Queue 
> Structure")
> 
> How can this memory be protected? As I understand it, the e1000 should
> track the record in RDBA and enable memory protection in this region.
> But how to do it right?

I misunderstood the issue and you can probably ignore my comments about
coalesced MMIO.  You quoted descriptor DMA code below so coalesced MMIO
shouldn't be relevant since desc->status isn't an MMIO register.

> 
> Source code qemu:
> hw/net/e1000.c:954 (version master)
> 
>  954         base = rx_desc_base(s) + sizeof(desc) * s->mac_reg[RDH];
> where rx_desc_base -- address RDBAH regs. It address no have RCU protect.
> ...
> 955         pci_dma_read(d, base, &desc, sizeof(desc));
> ...
> 957         desc.status |= (vlan_status | E1000_RXD_STAT_DD);
> ...
> 990         pci_dma_write(d, base, &desc, sizeof(desc));
> ->
> exec.c:
> 3111 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
> 3112                                            MemTxAttrs attrs,
> 3113                                            const uint8_t *buf,
> 3114                                            hwaddr len, hwaddr addr1,
> 3115                                            hwaddr l, MemoryRegion *mr)
> 3116 {
> ...
> 3123         if (!memory_access_is_direct(mr, true)) {
> (false)
> 3131         } else {
> 3132             /* RAM case */
> 3133             ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l, false);
> 3134             memcpy(ptr, buf, l);
> 
> where I be seeing weird behavior with KVM due to MMIO write coalescing
> 
> 3135             invalidate_and_set_dirty(mr, addr1, l);
> 3136         }
> 3137
> 
> Source code dpdk(e1000): (version dpdk-stable-17.11.9)
> drivers/net/e1000/em_rxtx.c:
> 
> 699 uint16_t
> 700 eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> 701                 uint16_t nb_pkts)
> ...
> 718         rxq = rx_queue
> ...
> 722         rx_id = rxq->rx_tail;
> 723         rx_ring = rxq->rx_ring
> ...
> 734                 rxdp = &rx_ring[rx_id];
> 735                 status = rxdp->status;
> 736                 if (! (status & E1000_RXD_STAT_DD))
> 737                         break;
> ...
> 807                 rxdp->buffer_addr = dma_addr;
> 808                 rxdp->status = 0;
> where I be seeing weird behavior with KVM due to MMIO write
> coalescing

It could be a bug in QEMU's e1000 emulation - maybe it's not doing
things in the correct order and causes a race condition with the DPDK
polling driver - or it could be a bug in the DPDK e1000 driver regarding
the order in which the descriptor ring and RX Head/Tail MMIO registers
are updated.

Did you find the root cause?

> P.S.
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
>  All tests on single virtual CPU.
> 
> I created github project for quick reproduction of this error:
> https://github.com/BASM/qemu_dpdk_e1000_test
> 
> ---
> Best regards,
> Leonid Myravjev
> 
> On Thu, 21 Nov 2019 at 17:05, Stefan Hajnoczi <address@hidden> wrote:
> >
> > On Wed, Nov 20, 2019 at 08:36:32PM +0300, ASM wrote:
> > > I trying solve the problem, with packets stopping (e1000,tap,kvm).
> > > My studies led to the following:
> > > 1. From flatview_write_continue() I see, what e1000 writes the number
> > > "7" to the STAT register.
> > > 2. The driver from target OS reads STAT register with number "7" and
> > > writes to the register the number "0".
> > > 3. From flatview_write_continue() (I make edits):
> > >             memcpy(ptr, buf, l);
> > >             new1=ptr[0xc];
> > >             usleep(100);
> > >             new2=ptr[0xc];
> > >             invalidate_and_set_dirty(mr, addr1, l);
> > >             new3=ptr[0xc];
> > > printf("Old: %i, new1, %i, new2: %i, new3: %i\n", old,new1,new2,new3);
> > >
> > > I see what memory in first printf is "7", but after usleep() is "0".
> > > Do I understand correctly that this should not be? Or RCU lock
> > > suggests the ability to the multiple writers?
> > >
> > > The problem is that qemu(e1000) writes the number 7, after which
> > > target(dpdk driver) reads 7, on the basis of this it writes the number
> > > 0, but as a result (extremely rarely), the value STATUS still remains
> > > 7. Therefore, packet processing is interrupted. This behavior is
> > > observed only on kvm (it is not observed on tcg).
> > >
> > > Please help with advice or ideas.
> >
> > Hi Leonid,
> > Could you be seeing weird behavior with KVM due to MMIO write
> > coalescing?
> >
> >   static void e1000_mmio_setup(E1000State *d)
> >   {
> >       int i;
> >       const uint32_t excluded_regs[] = {
> >           E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
> >           E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
> >       };
> >
> >       memory_region_init_io(&d->mmio, OBJECT(d), &e1000_mmio_ops, d,
> >                             "e1000-mmio", PNPMMIO_SIZE);
> >       memory_region_add_coalescing(&d->mmio, 0, excluded_regs[0]);
> >       for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
> >           memory_region_add_coalescing(&d->mmio, excluded_regs[i] + 4,
> >                                        excluded_regs[i+1] - 
> > excluded_regs[i] - 4);
> >       memory_region_init_io(&d->io, OBJECT(d), &e1000_io_ops, d, 
> > "e1000-io", IOPORT_SIZE);
> >   }
> >
> > MMIO write coalescing means that QEMU doesn't see the register writes
> > immediately.  Instead kvm.ko records them into a ring buffer and QEMU
> > processes the ring when the next ioctl(KVM_RUN) exit occurs.
> >
> > See Linux Documentation/virt/kvm/api.txt "4.116
> > KVM_(UN)REGISTER_COALESCED_MMIO" for more details.
> >
> > I don't really understand your printf debugging explanation.  It would
> > help to see the DPDK code and the exact printf() output.
> >
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
> >
> > Stefan

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled), Stefan Hajnoczi <=
- Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled), ASM, 2019/12/30

Prev by Date: Re: [PATCH 01/10] ppc/pnv: Modify the powerdown notifier to get the PowerNV machine
Next by Date: Re: [PATCH] block: nbd: Fix dirty bitmap context name
Previous by thread: [PATCH] iotests/279: Fix for non-qcow2 formats
Next by thread: Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
Index(es):
- Date
- Thread