On Thu, Jan 24, 2019 at 10:11:55AM +0000, Daniel P. Berrangé wrote:
On Wed, Jan 23, 2019 at 07:53:46PM +0000, Dr. David Alan Gilbert wrote:
Do you mean the:
/* send a qmp command to guarantee that 'connected' is setting to true. */
qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
why was that ever sufficient to know the socket was ready?
This doesn't make any sense to me.
There's the netdev socket, which has been passed in as a pre-opened socket
FD, so that's guaranteed connected.
There's the chardev server socket, to which we've just done a unix_connect()
call to establish a connection. If unix_connect() has succeeded, then at least
the socket is connected & ready for I/O from the test's side. This is a
reliable stream socket, so even if the test sends data on the socket right away
and QEMU isn't ready, it won't be lost. It'll be buffered and received by QEMU
as soon as QEMU starts to monitor for incoming data on the socket.
So I don't get what trying to wait for a "connected" state actually achieves.
It feels like a mistaken attempt to paper over some other unknown flaw that
just worked by some lucky side-effect.
Immediately after writing that, I see what's happened.
The filter_redirector_receive_iov() method is triggered when QEMU reads
from the -netdev socket (which we passed in as an FD and immediately
write to).
This method will discard all data, however, if the chr_out -chardev is
not in a connected state. So we do indeed have a race condition in this
test suite.
In fact I'd say this filter-mirror object is racy by design even when
run in normal usage, if your chardev is a server mode with "nowait" set,
or is a client mode with "reconnect" set. It will simply discard data.
We can fix the test suite by using FD passing for the -chardev
too, so we're guaranteed to be connected immediately. It might be
possible to remove "nowait" flag, but I'm not sure if that will cause
problems with the qtest handshake as it might block QEMU at startup
preventing qtest handshake from being performed.
If we care about the race in real QEMU execution, then we must either
document that "nowait" or "reconnect" should never be used with
filter-mirror, or perhaps can make use of "qemu_chr_wait_connected"
to synchronize startup fo the filter-mirror object with the chardev
initialization. That could fix the test suite too