qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH ssh] ssh: Do not switch session to non-blocking mode


From: Kevin Wolf
Subject: Re: [PATCH ssh] ssh: Do not switch session to non-blocking mode
Date: Fri, 22 Nov 2024 15:43:20 +0100

Am 18.11.2024 um 18:06 hat Jakub Jelen geschrieben:
> Hi Kevin,
> Sorry for the delay, my gmail filters will need some love to handle this
> high-traffic mailing lists so I can catch replies ...
> 
> On Thu, Nov 14, 2024 at 6:49 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > [...]
> > Hm, after looking some more at the code, I agree that it can't have
> > worked, for the simple reason that sftp_read() never returns SSH_AGAIN,
> > but turns it into 0. Which QEMU would have reported as an I/O error if
> > we're not at EOF.
> >
> > What I don't understand yet where in the code it would have blocked
> > before rather than returning an error. I tried to follow the code path
> > and didn't see anything like it, but obviously I'm also not familiar
> > with libssh code. I guess it also doesn't really matter as long as we
> > know it has always been broken...
> 
> I think it is the cycle in the sftp_packet_read(), which was polling
> until we got something to read here:
> 
> https://gitlab.com/libssh/libssh-mirror/-/blob/master/src/sftp_common.c?ref_type=heads#L75-L100
> 
> But I would have to retrace through the code as I already forgot where I
> was looking to.

The loop you highlighted does at most 4 reads and errors out if it
doesn't read anything, so I don't think that's it. If it blocks,
something inside of ssh_channel_read() must do that.

> > The thing that maybe misled me is that sftp_recv_response_msg() calls
> > ssh_channel_poll() first to make sure that there is even something to
> > read. So I expected it should have been non-blocking at least in some
> > cases, but if it had been, we would probably have seen I/O errors all
> > the time?
> 
> Note, that the sftp_recv_response_msg() function is new in master and
> not yet in any released versions. The 0.11 version had processing directly
> in the sftp_read() (and in each other function handling some responses).
> 
> https://gitlab.com/libssh/libssh-mirror/-/blob/stable-0.11/src/sftp.c?ref_type=heads#L1200
> 
> Also the ssh_channel_poll() is called only when the file handle is in
> non-blocking mode (whatever it means). The qemu sets the blocking mode
> here:
> 
> https://github.com/qemu/qemu/blob/master/block/ssh.c#L805C1-L805C44
> 
> This means the execution goes directly into sftp_read_and_dispatch(),
> which we discussed above as it does the blocking.

Oh, right, I was confused with blocking mode for sftp and for the ssh
session. We have sftp blocking and the session non-blocking for the most
part.

> > > > So I'm not sure if sftp_aio_*() can be combined with something else into
> > > > a working solution, and I also don't know if it's affected by the same
> > > > libssh bug.
> > >
> > > Right now, we do not have a full solution. But having SFTP working
> > > completely in nonoblocking mode is one of the things we would like to have
> > > in the long term.
> > >
> > > > Jakub, can you help with that?
> > > >
> > > > [...]
> > > >
> > > > As far as I can see, libssh sessions aren't thread safe, so we'll have
> > > > to make sure to have only one request going at the same time, but I
> > > > assume that calling ssh_read/write() from different threads sequentially
> > > > isn't a problem?
> > >
> > > My understanding is that the thread safety of libssh is limited to not
> > > sharing session between threads -- there is no synchronization if two
> > > threads would send packets at the same time:
> > >
> > > https://api.libssh.org/master/
> > >
> > > If you will make sure you will not call sftp_read()/sftp_write() at
> > > the same time from different threads, it might work, but it is
> > > untested.
> >
> > How do you feel about it? Do you think this is something libssh can
> > support, or is it something that might accidentally work today, but not
> > necessarily next year?
> 
> If we will write some test coverage for this, I think we can make sure it
> keeps working. There is really nothing that libssh would do in the background
> so the stuff should not go wrong. We will just make sure to double-check
> all the variables are a part of session and not scattered around as static
> variables or thread local ones (I hope we don't have much of these though).

Ok, that sounds good.

> > We have a thread pool readily available that we could use, but then
> > requests for the same session would come from different threads - just
> > never at the same time. If we need a single long-lived thread per
> > session instead, that might be a little more involved because we might
> > have to implement all the communication and synchronisation from
> > scratch.
> >
> > (Hmm... Or we abuse the IOThread object to create one internally and
> > just move the request coroutine to it around libssh calls. That could be
> > easy enough.)
> 
> Sorry, I don't have much experience around this to bring any useful insight
> here.
> 
> So going back to the original issue, is the proposed patch something
> that could work for you in the short term before a better solution
> will be implemented or is there something we should change?

Given that apparently it was always blocking, the patch doesn't make the
situation any worse. I'll apply it for 9.2.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]