[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug in thread support
bug in thread support
Sun, 21 Oct 2001 14:22:21 +0200
I was sending this information and example program to the linux kernel
folks, but they responded that this must be a libc bug instead. So I'm
sending this information to you. (the thread on the linux-kernel mailing
list should give you additional information in addition to this message)
So the problem: we are developing a massively multithreaded application.
This application sends syslog() messages from its threads. The problem I'm
encountering seems to be related to SIGPIPE handling (either the kernel
signal code, the libc signal code or the linuxthreads signal code)
Our application starts a new thread for each new TCP session. Writing to
sockets may result in a SIGPIPE to be delivered and an EPIPE to be returned
from write() when the remote end closes its socket. If this SIGPIPE happens
about the same time as a syslog() libc call, a segmentation fault occurs.
Since core dumping of multithreaded programs do not work reliably, I
implemented a quick&dirty backtrace function, which dumps the stack when a
signal occurs. (see the attached test program)
My backtrace function reports that the SIGSEGV occurs at virtual address
address@hidden:~$ cc -g -lpthread stressthreads.c
Signal (11) received, stackdump follows; eax='ffffffe0', ebx='0000001d',
ecx='bc5ff96c', edx='00000400', eip='00000001'
address@hidden:~$ gdb a.out
GNU gdb 19990928
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(gdb) info line *0x8048a2a
Line 80 of "stressthreads.c" starts at address 0x8048a12 <thread_func+118>
and ends at 0x8048a2d <thread_func+145>.
(gdb) l stressthreads.c:80
77 memset(buf, 'a', sizeof(buf));
78 for (i = 0; i < 1024; i++)
80 write(fd, buf, sizeof(buf));
83 //syslog(LOG_DEBUG, "thread stopped...%p\n", pthread_self());
(gdb) x/2i 0x8048a25
0x8048a25 <thread_func+137>: call 0x8048680 <write>
0x8048a2a <thread_func+142>: add $0x10,%esp
so the virtual address of 0x804892a points where the write() call returns.
The attached test program reproduces the SIGSEGV, although the time needed
to do this depends whether you are using SMP or non-SMP kernel. SMP kernel
with more than a single processor crashes within 1 second.
Some instructions how to use the attached test programs:
1) stressthreads.c is the server, which crashes, compile it with
gcc stressthreads.c -lpthreads
and run it. It will bind itself to port 0.0.0.0:10000, and listens for
incoming connections. It will syslog() a message, and write 1MB of data
to the opened socket. The syslog() call is protected by a mutex (which I
don't think is necessary, at least glibc seems to do locking on its own)
2) test-zorp.py, a small python script starting several parallel threads,
connecting to the server in each thread, reading 1024 bytes of data, and
closing the connection. (this will cause a nice SIGPIPE in the server
Since this script was only put together to reproduce the problem, no
argument parsing is done. You will need to adjust the IP address of the
server at the end of the script (test() function call.)
The application sets the SIGPIPE handler to a dummy function doing nothing
but a return. (earlier it was SIG_IGNed, but since I suspected it the source
of the problems I changed the code to use an empty function)
The crash does _NOT_ occur if the threads do not send log messages via
syslog(). I implemented my own syslog() routines for the time being, and the
crash doesn't occur. I tried to narrow down the problem even more, but
simply changing SIGPIPE handlers during the thread execution was not enough.
(this is what syslog() is doing)
There are several defines changing the behaviour of stressthreads.c:
BACKTRACE when #defined it uses my backtrace function reporting the exact
location of the sigsegv, otherwise SIGSEGV is not masked.
SYSLOG whe #defined the threads send info to syslog. The crash doesn't
occur with this undefined.
SIGACTION use the SIGPIPE set/reset code similar to what is found in
syslog() function. The crash didn't occur for me.
The environment I have here is Debian GNU/Linux potato:
ii libc6 2.1.3-18 GNU C Library: Shared libraries and Timezone
address@hidden:~$ uname -a
Linux hugefw 2.2.19 #2 SMP Thu Sep 27 17:23:56 CEST 2001 i686 unknown
(hugefw has two PIII 800Mhz processors)
If you need more information, please tell me I'd be glad to help.
Thanks in advance.
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
Description: Text Data
Description: Text document
Description: PGP signature
|[Prev in Thread]
||[Next in Thread]|
- bug in thread support,
Balazs Scheidler <=