qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH ssh] ssh: Do not switch session to non-blocking mode


From: Jakub Jelen
Subject: Re: [PATCH ssh] ssh: Do not switch session to non-blocking mode
Date: Mon, 18 Nov 2024 18:06:24 +0100

Hi Kevin,
Sorry for the delay, my gmail filters will need some love to handle this
high-traffic mailing lists so I can catch replies ...

On Thu, Nov 14, 2024 at 6:49 PM Kevin Wolf <kwolf@redhat.com> wrote:
> [...]
> Hm, after looking some more at the code, I agree that it can't have
> worked, for the simple reason that sftp_read() never returns SSH_AGAIN,
> but turns it into 0. Which QEMU would have reported as an I/O error if
> we're not at EOF.
>
> What I don't understand yet where in the code it would have blocked
> before rather than returning an error. I tried to follow the code path
> and didn't see anything like it, but obviously I'm also not familiar
> with libssh code. I guess it also doesn't really matter as long as we
> know it has always been broken...

I think it is the cycle in the sftp_packet_read(), which was polling
until we got
something to read here:

https://gitlab.com/libssh/libssh-mirror/-/blob/master/src/sftp_common.c?ref_type=heads#L75-L100

But I would have to retrace through the code as I already forgot where I
was looking to.

> The thing that maybe misled me is that sftp_recv_response_msg() calls
> ssh_channel_poll() first to make sure that there is even something to
> read. So I expected it should have been non-blocking at least in some
> cases, but if it had been, we would probably have seen I/O errors all
> the time?

Note, that the sftp_recv_response_msg() function is new in master and
not yet in any released versions. The 0.11 version had processing directly
in the sftp_read() (and in each other function handling some responses).

https://gitlab.com/libssh/libssh-mirror/-/blob/stable-0.11/src/sftp.c?ref_type=heads#L1200

Also the ssh_channel_poll() is called only when the file handle is in
non-blocking
mode (whatever it means). The qemu sets the blocking mode here:

https://github.com/qemu/qemu/blob/master/block/ssh.c#L805C1-L805C44

This means the execution goes directly into sftp_read_and_dispatch(), which
we discussed above as it does the blocking.

> > > So I'm not sure if sftp_aio_*() can be combined with something else into
> > > a working solution, and I also don't know if it's affected by the same
> > > libssh bug.
> >
> > Right now, we do not have a full solution. But having SFTP working
> > completely in nonoblocking mode is one of the things we would like to have
> > in the long term.
> >
> > > Jakub, can you help with that?
> > >
> > > [...]
> > >
> > > As far as I can see, libssh sessions aren't thread safe, so we'll have
> > > to make sure to have only one request going at the same time, but I
> > > assume that calling ssh_read/write() from different threads sequentially
> > > isn't a problem?
> >
> > My understanding is that the thread safety of libssh is limited to not
> > sharing session between threads -- there is no synchronization if two
> > threads would send packets at the same time:
> >
> > https://api.libssh.org/master/
> >
> > If you will make sure you will not call sftp_read()/sftp_write() at
> > the same time from different threads, it might work, but it is
> > untested.
>
> How do you feel about it? Do you think this is something libssh can
> support, or is it something that might accidentally work today, but not
> necessarily next year?

If we will write some test coverage for this, I think we can make sure it
keeps working. There is really nothing that libssh would do in the background
so the stuff should not go wrong. We will just make sure to double-check
all the variables are a part of session and not scattered around as static
variables or thread local ones (I hope we don't have much of these though).

> We have a thread pool readily available that we could use, but then
> requests for the same session would come from different threads - just
> never at the same time. If we need a single long-lived thread per
> session instead, that might be a little more involved because we might
> have to implement all the communication and synchronisation from
> scratch.
>
> (Hmm... Or we abuse the IOThread object to create one internally and
> just move the request coroutine to it around libssh calls. That could be
> easy enough.)

Sorry, I don't have much experience around this to bring any useful insight
here.

So going back to the original issue, is the proposed patch something that
could work for you in the short term before a better solution will be
implemented
or is there something we should change?

Jakub




reply via email to

[Prev in Thread] Current Thread [Next in Thread]