qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 49/51] io/channel-watch: Fix socket watch on Windows


From: Bin Meng
Subject: Re: [PATCH 49/51] io/channel-watch: Fix socket watch on Windows
Date: Tue, 6 Sep 2022 16:14:50 +0800

Hi Clément,

On Tue, Sep 6, 2022 at 3:41 PM Clément Chigot <chigot@adacore.com> wrote:
>
> Hi Bin,
>
> > On Mon, Sep 5, 2022 at 4:10 PM Clément Chigot <chigot@adacore.com> wrote:
> > >
> > > Hi all,
> > >
> > > I did reach the same issue while trying to connect a gdb to qemu on
> > > Windows hosts. Some inputs send by gdb aren't getting correctly pulled,
> > > they will be retrieved only once g_poll times out.
> > >
> > > As you explained earlier, the issue, AFAICT, is that WSAEnumNetworkEvents
> > > will reset the internal events before select can detect them.
> >
> > No, I don't think WSAEnumNetworkEvents and select cannot be used together.
> >
> > MSDN says:
> > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
> >
> > "The select function has no effect on the persistence of socket events
> > registered with WSAAsyncSelect or WSAEventSelect."
> >
> > That sounds to me like current usage in QEMU codes does not have a problem.
>
> Yes, but WSAEnumNetworkEvents has effects. Thus, it might reset the
> events before
> select detects them no ?

Even if the event *handle* is reset, select can still detect the
network event has happened. I think the above MSDN statement is trying
to explain this.

> Moreover, I'm not sure what they mean by "persistence of WSAEventSelect".
> Does that mean that select doesn't change the events wanted to be
> notified as set
> by WSAEventSelect or that select isn't resetting the events once
> retrieved, as you
> imply ?

My understanding is that select just determines the socket status
neither from the event handle associated with the socket, nor does it
change anything on the event handle.

> Moreover, the current code is creating the event object with
> CreateEvent(NULL, FALSE, FALSE, NULL). The first FALSE implies that we want
> an auto-reset. The MSDN doc says:
>
> "Boolean that specifies whether a manual-reset or auto-reset event
> object is created.
> If TRUE, then you must use the ResetEvent function to manually reset
> the state to
> nonsignaled. If FALSE, the system automatically resets the state to 
> nonsignaled
> after a single waiting thread has been released."

Yes, I think this is a bug however when I changed to CreateEvent(NULL,
TRUE, FALSE, NULL) or WSACreateEvent() there are still "Broken pipe"
errors ...

> However, I'm not sure if I understand correctly what "after a single
> waiting thread
> has been released." signified. Is the reset occuring after recv() or
> after select or
> WSAEnumNetworkEvents calls ? AFAIR, I've tried to move to manual-reset with
> the current qemu but it doesn't change anything.

The same here :(

>
> > > Sadly, I didn't find any way to adjust the current code using select to
> > > avoid that. As select and any cleaner (ever WSAEnumNetworkEvents or
> > > WSEResetEvent) cannot be called in an atomic way, it seems that events
> > > can always be received between the reset and the select. At least, all
> > > my attempts failed.
> >
> > According to MSDN:
> > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect
> >
> > "Having successfully recorded the occurrence of the network event (by
> > setting the corresponding bit in the internal network event record)
> > and signaled the associated event object, no further actions are taken
> > for that network event until the application makes the function call
> > that implicitly reenables the setting of that network event and
> > signaling of the associated event object."
> >
> > So events will be kept unsignaled after they are signaled, until the
> > reenable routine is called. For example, recv() for the FD_READ event.
>
> In my understanding, WSAEnumNetworkEvents states the opposite:
> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaenumnetworkevents
>
> "The Windows Sockets provider guarantees that the operations of copying
> the network event record, clearing it and resetting any associated event
> object are atomic, such that the next occurrence of a nominated network
> event will cause the event object to become set."

My interpretation of this does not conflict with the WSAEventSelect().
I think they are just two things.

>
> > > The only solution I've found is to move completely to the Windows API
> > > and stop calling select. This, however, needs some tricks. In Particular, 
> > > we
> > > need to "remove" the support of GSources having only G_IO_HUP.
> > > This is already kind of done as we currently don't detect them in
> > > qio_channel_socket_source_check anyway.
> > > This is mandatory because of the two GSources created on the same socket.
> > > IIRC, the first one (I'll call it the master GSource) is created during
> > > the initialization of the channel-socket by update_ioc_handlers and will
> > > have only G_IO_HUP to catch up.
> > > The second one is created during the "prepare" phase of the first one,
> > > in io_watch_poll_prepare. This one will have all the events we want
> > > to pull (G_IO_IN here).
> > > As they are referring to the same socket, the Windows API cannot know
> > > on which GSources we want to catch which events. The solution is then
> >
> > I think Windows knows which events to catch. As WSAEventSelect() in
> > channel-watch.c::qio_channel_create_socket_watch() tells Windows which
> > event it should monitor.
> >
> > Both the "master" and "child" GSources are watching on the same socket
> > with G_IO_IN condition (see qio_channel_create_socket_watch), and GLib
> > is smart enough to merge these two resources into one in the event
> > handle array which is passed to WaitForMultipleObjectsEx() in
> > g_poll().
>
> I've forgotten to mention it. But the current code only fails with newer glib
> versions. I wasn't able to test all of them but it's working with 2.54 and
> starts failing with versions after 2.60.
> The qemu master isn't supporting 2.54 anymore. Thus I've tested that with
> our internal qemu-6 which runs with 2.54. When upgrading glib, it starts
> behaving like our issue.
> Sadly, I wasn't able to test with glib-2.56/2.58 nor to find anything relevant
> in glib code which would explain our issue. (But I wasn't aware of the
> two GSources at that time so maybe it's worth investigating again).
>

Ah, good to know glib version matters.

> > I checked your patch, what you did seems to be something one would
> > naturally write, but what is currently in the QEMU sources seems to be
> > written intentionally.
> >
> > +Paolo Bonzini , you are the one who implemented the socket watch on
> > Windows. Could you please help analyze this issue?
> >
> > > to avoid WSAEnumNetworkEvents for the master GSource which only has
> > > G_IO_HUP (or for any GSource having only that).
> > > As I said above, the current code doesn't do anything with it anyway.
> > > So, IMO, it's safe to do so.
> > >
> > > I'll send you my patch attached. I was planning to send it in the 
> > > following
> > > weeks anyway. I was just waiting to be sure everything looks fine on our
> > > CI. Feel free to test and modify it if needed.
> >
> > I tested your patch. Unfortunately there is still one test case
> > (migration-test.exe) throwing up the "Broken pipe" message.
>
> I must say I didn't fully test it against qemu testsuite yet. Maybe there are
> some refinements to be done. "Broken pipe" might be linked to the missing
> G_IO_HUP support.
>
> > Can you test my patch instead to see if your gdb issue can be fixed?
>
> Yeah sure. I'll try to do it this afternoon.

Thanks!

Regards,
Bin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]