qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 49/51] io/channel-watch: Fix socket watch on Windows


From: Marc-André Lureau
Subject: Re: [PATCH 49/51] io/channel-watch: Fix socket watch on Windows
Date: Mon, 5 Sep 2022 10:04:27 +0400

Hi

On Sun, Sep 4, 2022 at 10:24 AM Bin Meng <bmeng.cn@gmail.com> wrote:
On Thu, Sep 1, 2022 at 8:58 PM Marc-André Lureau
<marcandre.lureau@gmail.com> wrote:
>
> Hi
>
> On Wed, Aug 24, 2022 at 2:49 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>
>> From: Bin Meng <bin.meng@windriver.com>
>>
>> Random failure was observed when running qtests on Windows due to
>> "Broken pipe" detected by qmp_fd_receive(). What happened is that
>> the qtest executable sends testing data over a socket to the QEMU
>> under test but no response is received. The errno of the recv()
>> call from the qtest executable indicates ETIMEOUT, due to the qmp
>> chardev's tcp_chr_read() is never called to receive testing data
>> hence no response is sent to the other side.
>>
>> tcp_chr_read() is registered as the callback of the socket watch
>> GSource. The reason of the callback not being called by glib, is
>> that the source check fails to indicate the source is ready. There
>> are two socket watch sources created to monitor the same socket
>> event object from the char-socket backend in update_ioc_handlers().
>>
>> During the source check phase, qio_channel_socket_source_check()
>> calls WSAEnumNetworkEvents() to discovers occurrences of network
>> events for the indicated socket, clear internal network event records,
>> and reset the event object. Testing shows that if we don't reset the
>> event object by not passing the event handle to WSAEnumNetworkEvents()
>> the symptom goes away and qtest runs very stably.
>>
>> It looks we don't need to call WSAEnumNetworkEvents() at all, as we
>> don't parse the result of WSANETWORKEVENTS returned from this API.
>> We use select() to poll the socket status. Fix this instability by
>> dropping the WSAEnumNetworkEvents() call.
>>
>> Signed-off-by: Bin Meng <bin.meng@windriver.com>
>
>
> What clears the event then?
>

It seems we don't need to clear the event as everything still works as expected.

Well, it can "work" but are you sure it doesn't have a busy loop?

>>
>> ---
>> During the testing, I removed the following codes in update_ioc_handlers():
>>
>>     remove_hup_source(s);
>>     s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP);
>>     g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup,
>>                           chr, NULL);
>>     g_source_attach(s->hup_source, chr->gcontext);
>>
>> and such change also makes the symptom go away.
>>
>> And if I moved the above codes to the beginning, before the call to
>> io_add_watch_poll(), the symptom also goes away.
>>
>> It seems two sources watching on the same socket event object is
>> the key that leads to the instability. The order of adding a source
>> watch seems to also play a role but I can't explain why.
>> Hopefully a Windows and glib expert could explain this behavior.
>>
>
> Feel free to leave that comment in the commit message.

Sure, will add the above message into the commit in v2.

>
> This is strange, as both sources should have different events, clearing one shouldn't affect the other.

Both sources have the same event, as one QIO channel only has one
socket, and one socket can only be bound to one event.

 "one socket can only be bound to one event", is that a WSAEventSelect limitation?


>
> I guess it's WSAEnumNetworkEvents clearing of the internal network event records that is problematic.
>
> Can you check if you replace the call with ResetEvent() everything works?

No, ResetEvent() does not work, and is not recommended by MSDN [1]
too, which says:

It probably works to some extent (I see glib is using it https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/giowin32.c#L903), What you mean is that it doesn't solve the issue, I guess.

The proper way to reset the state of an event object used with the
WSAEventSelect function is to pass the handle of the event object to
the WSAEnumNetworkEvents function in the hEventObject parameter. This
will reset the event object and adjust the status of active FD events
on the socket in an atomic fashion.


This is not what you want though if you have multiple event objects for the same socket.
 
[1] https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect

>
>
>>
>>  io/channel-watch.c | 4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/io/channel-watch.c b/io/channel-watch.c
>> index 89f3c8a88a..e34d86e810 100644
>> --- a/io/channel-watch.c
>> +++ b/io/channel-watch.c
>> @@ -115,17 +115,13 @@ static gboolean
>>  qio_channel_socket_source_check(GSource *source)
>>  {
>>      static struct timeval tv0;
>> -
>>      QIOChannelSocketSource *ssource = (QIOChannelSocketSource *)source;
>> -    WSANETWORKEVENTS ev;
>>      fd_set rfds, wfds, xfds;
>>
>>      if (!ssource->condition) {
>>          return 0;
>>      }
>>
>> -    WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev);
>> -
>>      FD_ZERO(&rfds);
>>      FD_ZERO(&wfds);
>>      FD_ZERO(&xfds);
>
>
> Unrelated, after this chunk, there is
>         FD_SET((SOCKET)ssource->socket, &rfds);
>
> it seems we can drop the cast. Feel free to send another patch.
>

Yeah, this cast is unnecessary. Will spin a new patch in v2. Thanks!

Regards,
Bin


--
Marc-André Lureau

reply via email to

[Prev in Thread] Current Thread [Next in Thread]