qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 49/51] io/channel-watch: Fix socket watch on Windows


From: Clément Chigot
Subject: Re: [PATCH 49/51] io/channel-watch: Fix socket watch on Windows
Date: Tue, 6 Sep 2022 09:41:41 +0200

Hi Bin,

> On Mon, Sep 5, 2022 at 4:10 PM Clément Chigot <chigot@adacore.com> wrote:
> >
> > Hi all,
> >
> > I did reach the same issue while trying to connect a gdb to qemu on
> > Windows hosts. Some inputs send by gdb aren't getting correctly pulled,
> > they will be retrieved only once g_poll times out.
> >
> > As you explained earlier, the issue, AFAICT, is that WSAEnumNetworkEvents
> > will reset the internal events before select can detect them.
>
> No, I don't think WSAEnumNetworkEvents and select cannot be used together.
>
> MSDN says:
> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
>
> "The select function has no effect on the persistence of socket events
> registered with WSAAsyncSelect or WSAEventSelect."
>
> That sounds to me like current usage in QEMU codes does not have a problem.

Yes, but WSAEnumNetworkEvents has effects. Thus, it might reset the
events before
select detects them no ?
Moreover, I'm not sure what they mean by "persistence of WSAEventSelect".
Does that mean that select doesn't change the events wanted to be
notified as set
by WSAEventSelect or that select isn't resetting the events once
retrieved, as you
imply ?
Moreover, the current code is creating the event object with
CreateEvent(NULL, FALSE, FALSE, NULL). The first FALSE implies that we want
an auto-reset. The MSDN doc says:

"Boolean that specifies whether a manual-reset or auto-reset event
object is created.
If TRUE, then you must use the ResetEvent function to manually reset
the state to
nonsignaled. If FALSE, the system automatically resets the state to nonsignaled
after a single waiting thread has been released."

However, I'm not sure if I understand correctly what "after a single
waiting thread
has been released." signified. Is the reset occuring after recv() or
after select or
WSAEnumNetworkEvents calls ? AFAIR, I've tried to move to manual-reset with
the current qemu but it doesn't change anything.

> > Sadly, I didn't find any way to adjust the current code using select to
> > avoid that. As select and any cleaner (ever WSAEnumNetworkEvents or
> > WSEResetEvent) cannot be called in an atomic way, it seems that events
> > can always be received between the reset and the select. At least, all
> > my attempts failed.
>
> According to MSDN:
> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect
>
> "Having successfully recorded the occurrence of the network event (by
> setting the corresponding bit in the internal network event record)
> and signaled the associated event object, no further actions are taken
> for that network event until the application makes the function call
> that implicitly reenables the setting of that network event and
> signaling of the associated event object."
>
> So events will be kept unsignaled after they are signaled, until the
> reenable routine is called. For example, recv() for the FD_READ event.

In my understanding, WSAEnumNetworkEvents states the opposite:
https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaenumnetworkevents

"The Windows Sockets provider guarantees that the operations of copying
the network event record, clearing it and resetting any associated event
object are atomic, such that the next occurrence of a nominated network
event will cause the event object to become set."

> > The only solution I've found is to move completely to the Windows API
> > and stop calling select. This, however, needs some tricks. In Particular, we
> > need to "remove" the support of GSources having only G_IO_HUP.
> > This is already kind of done as we currently don't detect them in
> > qio_channel_socket_source_check anyway.
> > This is mandatory because of the two GSources created on the same socket.
> > IIRC, the first one (I'll call it the master GSource) is created during
> > the initialization of the channel-socket by update_ioc_handlers and will
> > have only G_IO_HUP to catch up.
> > The second one is created during the "prepare" phase of the first one,
> > in io_watch_poll_prepare. This one will have all the events we want
> > to pull (G_IO_IN here).
> > As they are referring to the same socket, the Windows API cannot know
> > on which GSources we want to catch which events. The solution is then
>
> I think Windows knows which events to catch. As WSAEventSelect() in
> channel-watch.c::qio_channel_create_socket_watch() tells Windows which
> event it should monitor.
>
> Both the "master" and "child" GSources are watching on the same socket
> with G_IO_IN condition (see qio_channel_create_socket_watch), and GLib
> is smart enough to merge these two resources into one in the event
> handle array which is passed to WaitForMultipleObjectsEx() in
> g_poll().

I've forgotten to mention it. But the current code only fails with newer glib
versions. I wasn't able to test all of them but it's working with 2.54 and
starts failing with versions after 2.60.
The qemu master isn't supporting 2.54 anymore. Thus I've tested that with
our internal qemu-6 which runs with 2.54. When upgrading glib, it starts
behaving like our issue.
Sadly, I wasn't able to test with glib-2.56/2.58 nor to find anything relevant
in glib code which would explain our issue. (But I wasn't aware of the
two GSources at that time so maybe it's worth investigating again).

> I checked your patch, what you did seems to be something one would
> naturally write, but what is currently in the QEMU sources seems to be
> written intentionally.
>
> +Paolo Bonzini , you are the one who implemented the socket watch on
> Windows. Could you please help analyze this issue?
>
> > to avoid WSAEnumNetworkEvents for the master GSource which only has
> > G_IO_HUP (or for any GSource having only that).
> > As I said above, the current code doesn't do anything with it anyway.
> > So, IMO, it's safe to do so.
> >
> > I'll send you my patch attached. I was planning to send it in the following
> > weeks anyway. I was just waiting to be sure everything looks fine on our
> > CI. Feel free to test and modify it if needed.
>
> I tested your patch. Unfortunately there is still one test case
> (migration-test.exe) throwing up the "Broken pipe" message.

I must say I didn't fully test it against qemu testsuite yet. Maybe there are
some refinements to be done. "Broken pipe" might be linked to the missing
G_IO_HUP support.

> Can you test my patch instead to see if your gdb issue can be fixed?

Yeah sure. I'll try to do it this afternoon.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]