qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 6/6] RFH: We lost "connect" events


From: Daniel P . Berrangé
Subject: Re: [Qemu-devel] [PATCH 6/6] RFH: We lost "connect" events
Date: Mon, 19 Aug 2019 11:40:01 +0100
User-agent: Mutt/1.12.0 (2019-05-25)

On Mon, Aug 19, 2019 at 12:33:45PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <address@hidden> wrote:
> > On Wed, Aug 14, 2019 at 04:02:18AM +0200, Juan Quintela wrote:
> >> When we have lots of channels, sometimes multifd migration fails
> >> with the following error:
> >>   after some time, sending side decides to send another packet through
> >>   that channel, and it is now when we get the above error.
> >> 
> >> Any good ideas?
> >
> > In inet_listen_saddr() we call
> >
> >     if (!listen(slisten, 1)) {
> >
> > note the second parameter sets the socket backlog, which is the max
> > number of pending socket connections we allow. My guess is that the
> > target QEMU is not accepting incoming connections quickly enough and
> > thus you hit the limit & the kernel starts dropping the incoming
> > connections.
> >
> > As a quick test, just hack this code to pass a value of 100 and see
> > if it makes your test reliable. If it does, then we'll need to figure
> > out a nice way to handle backlog instead of hardcoding it at 1.
> 
> I will test.
> 
> But notice that the qemu_connect() on source side says that things went
> right.  It is the destination what is *not* calling the callback.  Or
> at least that is what I think it is happening.

IIRC, the connect() can succeed on the source host, even if the target host
has not called accept(), because the kernel will complete the connection at
the protocol level regardless. IOW, don't assume that the destination QEMU
has seen the connection. If you turn on tracing, we have a trace point
"qio_channel_socket_accept_complete" that is emitted after we have done
an accept(). So you can see if you get 100 of those trace points emitted
on the target, when you make 100 connects on the source.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]