|
From: | Jonathan Beit-Aharon |
Subject: | gdb on Linux gets SIGSEGV in a "zombie thread" |
Date: | Mon, 23 Jan 2006 11:29:05 -0500 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.7.12-1.3.1 |
Debugging a large daemon program (being ported from FreeBSD to Linux), I'm using gdb on Linux, and receiving the message: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 155540400 (zombie)] 0x0031a18b in malloc_consolidate () from /lib/tls/libc.so.6 The backtrace shows a thread program stack that is not shown by the "thread apply all bt", so I assume this is the bt of the "zombie thread". What is a zombie thread? A zombie process, if I understand correctly, is one all of whose resources have been reclaimed by the kernel, except for a few kernel structures that are kept for the sake of statistics relevant to other processes. This zombie, however, seems to still have an active program stack, and the chutzpah to abort. How can I trace a thread that doesn't appear in the backtrace? This is unique to **gdb on Linux**, because debugging the same modified code on FreeBSD exposed a bug (order of precedence in *ppcCharStrings[kk]=0; assignment), happening about 20 C-statements before the SIGSEGV. I'm including below much detail in the hope that it might help fix gdb. Thank you for a wonderful tool, Jonathan :-) The system characteristics are: address@hidden:~ $uname -a Linux bumper.cybertron.intrusic.com 2.6.12-1.1381_FC3 #1 Fri Oct 21 03:46:55 EDT 2005 i686 i686 i386 GNU/Linux address@hidden:~ $gcc --version gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Here is my debug session: address@hidden:/home/jbeitaharon/dev/manhattan/bin $gdb dlbr GNU gdb Red Hat Linux (6.1post-1.20040607.43.0.1rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) b _ChunkInsert Breakpoint 1 at 0x804bb8b: file dlbr_exchange.c, line 477. (gdb) r -D Starting program: /home/jbeitaharon/dev/manhattan/bin/dlbr -D Reading symbols from shared object read from target memory...done. Loaded system supplied DSO at 0xa8c000 [Thread debugging using libthread_db enabled] [New Thread -1208940160 (LWP 32742)] [New Thread 26168240 (LWP 304)] [New Thread 99670960 (LWP 305)] 32742 | utils.thrd.thrd_parent | LOG_DEBUG | thrd_parent.c : 372 | Wed Jan 18 16:07:38 2006 | Parent started | 32742 | utils.thrd.thrd_parent | LOG_DEBUG | thrd_parent.c : 747 | Wed Jan 18 16:07:38 2006 | Parent event loop started | [New Thread 123161520 (LWP 306)] [New Thread 36658096 (LWP 307)] 32742 | utils.thrd.thrd_child | LOG_ERROR | thrd_child.c : 1086 | Wed Jan 18 16:09:26 2006 | Error reading current virtual memory size: (null) | 32742 | utils.conf.conf | LOG_INFO | conf.c : 492 | Wed Jan 18 16:09:26 2006 | Configuration loaded 32742 | utils.conf.conf | LOG_INFO | conf.c : 492 | Wed Jan 18 16:09:56 2006 | Configuration loaded CONF: /storage/pcap and /storage/evidence quotas (5000 + 1000 MB) within available space (7317 MB free + 0 MB already used for data and evidence) [New Thread 47147952 (LWP 18185)] 32742 | utils.thrd.thrd_parent | LOG_DEBUG | thrd_parent.c : 747 | Wed Jan 18 16:17:09 2006 | Parent event loop started | [New Thread 57637808 (LWP 18186)] [New Thread 68127664 (LWP 18187)] [New Thread 82631600 (LWP 18188)] [New Thread 110160816 (LWP 18189)] [New Thread 133651376 (LWP 18190)] 32742 | utils.thrd.thrd_parent | LOG_DEBUG | thrd_parent.c : 747 | Wed Jan 18 16:17:09 2006 | Parent event loop started | [New Thread 145050544 (LWP 2789)] [New Thread 155540400 (LWP 2791)] 32742 | utils.thrd.thrd_child | LOG_DEBUG | thrd_child.c : 1623 | Wed Jan 18 16:58:33 2006 | [dlbr.exchange.conn] Processing own child finish event | Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 155540400 (zombie)] 0x0031a18b in malloc_consolidate () from /lib/tls/libc.so.6 (gdb) bt #0 0x0031a18b in malloc_consolidate () from /lib/tls/libc.so.6 #1 0x0031b1c3 in _int_malloc () from /lib/tls/libc.so.6 #2 0x0031ccf6 in calloc () from /lib/tls/libc.so.6 #3 0x00658161 in _dl_new_object () from /lib/ld-linux.so.2 #4 0x006545b9 in _dl_map_object_from_fd () from /lib/ld-linux.so.2 #5 0x0065605c in _dl_map_object () from /lib/ld-linux.so.2 #6 0x003b4738 in dl_open_worker () from /lib/tls/libc.so.6 #7 0x00000000 in ?? () (gdb) thread apply all bt Thread 12 (Thread 145050544 (LWP 2789)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x003718db in __write_nocancel () from /lib/tls/libc.so.6 #2 0x00316d6f in _IO_new_file_write () from /lib/tls/libc.so.6 #3 0x003157cb in _IO_new_do_write () from /lib/tls/libc.so.6 #4 0x00316278 in _IO_new_file_overflow () from /lib/tls/libc.so.6 #5 0x00316e92 in _IO_new_file_xsputn () from /lib/tls/libc.so.6 #6 0x002f38a7 in vfprintf () from /lib/tls/libc.so.6 #7 0x002fb40f in fprintf () from /lib/tls/libc.so.6 #8 0x006dbc50 in THRD_LogMsg (strCategory=0x6dffa5 "utils.thrd.thrd_child", tLogLevel=7, strFileName=0x6dfe18 "thrd_child.c", iLineNumber=1623, strFormat=0x6e0088 "[%s] Processing own child finish event") at thrd_child.c:1050 #9 0x006dc5a4 in _THRD_ChildEventProcess (pstChild=0x9f60e98) at thrd_child.c:1622 #10 0x006dc328 in _THRD_ChildEventStart (pvArg=0x9f60e98) at thrd_child.c:1500 #11 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #12 0x003806fe in clone () from /lib/tls/libc.so.6 Thread 11 (Thread 133651376 (LWP 18190)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x008a7a86 in pthread_cond_wait@@GLIBC_2.3.2 () ---Type <return> to continue, or q <return> to quit--- from /lib/tls/libpthread.so.0 #2 0x006da68c in Pthread_Cond_Wait (cond=0x9f60c20, mutex=0x9f60b40) at thrd_pthread.c:243 #3 0x006d9c99 in _THRD_ParentEventStart (pvArg=0x9f60b40) at thrd_parent.c:754 #4 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003806fe in clone () from /lib/tls/libc.so.6 Thread 10 (Thread 110160816 (LWP 18189)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x003435b6 in __nanosleep_nocancel () from /lib/tls/libc.so.6 #2 0x003433bc in sleep () from /lib/tls/libc.so.6 #3 0x006dabdd in Pthread_Cleanup_Pop (execute=0) at thrd_pthread.c:467 #4 0x006dba52 in THRD_LockPop (ptMutex=0x9f0f99c) at thrd_child.c:945 #5 0x0804c682 in DLBR_FormatStart (pvArg=0x9f5fc38) at dlbr_format.c:340 #6 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #7 0x003806fe in clone () from /lib/tls/libc.so.6 Thread 9 (Thread 82631600 (LWP 18188)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x00379151 in ___newselect_nocancel () from /lib/tls/libc.so.6 #2 0x00ce32e0 in NETX_ExchangeStart (pstExchange=0x9f600a0) at netx_exchange.c:377 #3 0x0804b8b7 in DLBR_ExchangeStart (pvArg=0x9f5fa68) at dlbr_exchange.c:384 ---Type <return> to continue, or q <return> to quit--- #4 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003806fe in clone () from /lib/tls/libc.so.6 Thread 8 (Thread 68127664 (LWP 18187)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x008a7a86 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x006da68c in Pthread_Cond_Wait (cond=0x9f5fd78, mutex=0x9f5fc38) at thrd_pthread.c:243 #3 0x006dc2d7 in _THRD_ChildEventStart (pvArg=0x9f5fc38) at thrd_child.c:1491 #4 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003806fe in clone () from /lib/tls/libc.so.6 Thread 7 (Thread 57637808 (LWP 18186)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x008a7a86 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x006da68c in Pthread_Cond_Wait (cond=0x9f5fb20, mutex=0x9f5fa68) at thrd_pthread.c:243 #3 0x006dc2d7 in _THRD_ChildEventStart (pvArg=0x9f5fa68) at thrd_child.c:1491 #4 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003806fe in clone () from /lib/tls/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 6 (Thread 47147952 (LWP 18185)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x008a7a86 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x006da68c in Pthread_Cond_Wait (cond=0x9f5f8a8, mutex=0x9f5f7c8) at thrd_pthread.c:243 #3 0x006d9c99 in _THRD_ParentEventStart (pvArg=0x9f5f7c8) at thrd_parent.c:754 #4 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003806fe in clone () from /lib/tls/libc.so.6 Thread 5 (Thread 36658096 (LWP 307)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x003435b6 in __nanosleep_nocancel () from /lib/tls/libc.so.6 #2 0x003433bc in sleep () from /lib/tls/libc.so.6 #3 0x0804a07b in DLBR_Run (pvArg=0x9e3b468) at dlbr.c:381 #4 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003806fe in clone () from /lib/tls/libc.so.6 Thread 4 (Thread 123161520 (LWP 306)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x008a7a86 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x006da68c in Pthread_Cond_Wait (cond=0x9e3b5a8, mutex=0x9e3b468) ---Type <return> to continue, or q <return> to quit--- at thrd_pthread.c:243 #3 0x006dc2d7 in _THRD_ChildEventStart (pvArg=0x9e3b468) at thrd_child.c:1491 #4 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003806fe in clone () from /lib/tls/libc.so.6 Thread 3 (Thread 99670960 (LWP 305)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x008a7a86 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x006da68c in Pthread_Cond_Wait (cond=0x9e3b2c0, mutex=0x9e3b1e0) at thrd_pthread.c:243 #3 0x006d9c99 in _THRD_ParentEventStart (pvArg=0x9e3b1e0) at thrd_parent.c:754 #4 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003806fe in clone () from /lib/tls/libc.so.6 Thread 2 (Thread 26168240 (LWP 304)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x008ab51e in do_sigwait () from /lib/tls/libpthread.so.0 #2 0x008ab5bf in sigwait () from /lib/tls/libpthread.so.0 #3 0x004e72af in BINW_SigWaiter (arg=0xbfa19050) at binw.c:330 #4 0x008a5341 in start_thread () from /lib/tls/libpthread.so.0 #5 0x003806fe in clone () from /lib/tls/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 1 (Thread -1208940160 (LWP 32742)): #0 0x00a8c402 in __kernel_vsyscall () #1 0x003435b6 in __nanosleep_nocancel () from /lib/tls/libc.so.6 #2 0x003433bc in sleep () from /lib/tls/libc.so.6 #3 0x004e71b2 in BINW_Wrap (argc=2, argv=0xbfa191c4, strVersion=0x804e7f8 "datalibrarian v1.0", strPidDefault=0x804e7ed "./dlbr.pid", strService=0x804e7e8 "dlbr", pfnService=0x8049f60 <DLBR_Run>) at binw.c:272 #4 0x08049f42 in main (argc=2, argv=0xbfa191c4) at dlibrarian.c:64 (gdb) thread 10 [Switching to thread 10 (Thread 110160816 (LWP 18189))]#0 0x00a8c402 in __kernel_vsyscall () (gdb) n Single stepping until exit from function __kernel_vsyscall, which has no line number information. -- After the last two response lines from gdb the process is hung. Thanks again, Jonathan |
[Prev in Thread] | Current Thread | [Next in Thread] |