bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#42740: Segfault in libssh during ‘guix copy’


From: Maxim Cournoyer
Subject: bug#42740: Segfault in libssh during ‘guix copy’
Date: Tue, 01 Sep 2020 09:56:56 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hi Ludovic and Artyom,

Ludovic Courtès <ludo@gnu.org> writes:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> So we have the finalization thread closing a channel of session
>> 0x12a4b20 (which causes a write on the channel), and the main thread
>> writing to a channel of that same session.  This is exactly what I
>> described at <https://issues.guix.gnu.org/26976#11>:
>>
>>   AIUI, that means there’s one output compression buffer per session,
>>   and it’s not thread-safe (in Guile 2.2 finalizers are called from a
>>   separate thread.)
>>
>>   I think the fix, in Guile-SSH, is to associate each libssh object
>>   (session, channel, etc.) with a mutex, and to protect all uses of the
>>   libssh object by that mutex.
>>
>> Artyom, WDYT?  Do you think you could take a look into that?
>>
>> In the meantime, I’ll look for the origin of the channel port that’s not
>> explicitly closed and see if we can work around it.
>
> I’ve pushed this change on our side to explicitly close channels and
> sessions:
>
>   
> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=61fe9ced7da7eefceb931af0cb7363b721f5bdd6
>
> This workaround is similar to that of 2017:
>
>   
> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=8e469b67f95cfe5b95405b503b8ee315fdf8ce66
>
> It’s really just a workaround so I think we should fix the core issue in
> Guile-SSH (or libssh) so it doesn’t pop up again next month—it’s hard to
> ensure code that opens a channel explicitly closes it.

Do you think the issue lies in guile-ssh or in libssh itself?  Sorry for
not having caught these problems earlier; it seemed to work reliably
when I last tested it.

> Anyway, I would welcome tests using ‘guix copy’, ‘guix deploy’, and
> offloading.  (For offloading, make sure to run the daemon from your
> build tree.)

While attempting to use offload on the core-updates branch, I
encountered stalls and file errors, but with your patch it seems to work
reliable (it's been offloading builds for the last 15 minutes or so
without interruption).

So your workaround fixes seem to work as intended.

I also agree that it'd be much nicer and future proof if we could fix
the root issue.

Thanks!

Maxim





reply via email to

[Prev in Thread] Current Thread [Next in Thread]