bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

linuxthreads bug? dead lock in "exit"


From: Carlo Wood
Subject: linuxthreads bug? dead lock in "exit"
Date: Thu, 16 May 2002 02:44:35 +0200
User-agent: Mutt/1.2.5i

Hiya,

I am trying to cause a threaded application
to "core dump".  While calling raise(6) works,
the application doesn't terminate, no matter
what I try.

At the moment I have to following situation:

(gdb) bt
#0  0x40348206 in __sigsuspend (set=0xbffff4e0) at 
../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x403129f9 in __pthread_wait_for_restart_signal (self=0x4031bee0) at 
pthread.c:969
#2  0x40312079 in pthread_onexit_process (retcode=0, arg=0x0) at restart.h:34
#3  0x4034a64a in exit (status=0) at exit.c:54
#4  0x403374bf in __libc_start_main (main=0x806d7d0 <main>, argc=1, 
ubp_av=0xbffff6c4, init=0x8056d84 <_init>, fini=0x807ab30 <_fini>,
    rtld_fini=0x4000c968 <_dl_fini>, stack_end=0xbffff6bc) at 
../sysdeps/generic/libc-start.c:129
Current language:  auto; currently c
(gdb) up
#1  0x403129f9 in __pthread_wait_for_restart_signal (self=0x4031bee0) at 
pthread.c:969
969         sigsuspend(&mask);                   /* Wait for signal */
(gdb)
#2  0x40312079 in pthread_onexit_process (retcode=0, arg=0x0) at restart.h:34
34        __pthread_wait_for_restart_signal(self);
(gdb)
#3  0x4034a64a in exit (status=0) at exit.c:54
54                    (*f->func.on.fn) (status, f->func.on.arg);
(gdb)
#4  0x403374bf in __libc_start_main (main=0x806d7d0 <main>, argc=1, 
ubp_av=0xbffff6c4, init=0x8056d84 <_init>, fini=0x807ab30 <_fini>,
    rtld_fini=0x4000c968 <_dl_fini>, stack_end=0xbffff6bc) at 
../sysdeps/generic/libc-start.c:129
129       exit ((*main) (argc, argv, __environ));
(gdb)
Initial frame selected; you cannot go up.
(gdb) thread 2
Thread ID 2 not known.
(gdb) thread 3
Thread ID 3 not known.
(gdb) thread 4
Thread ID 4 not known.
(gdb) thread 5
Thread ID 5 not known.

etc.

This seems to indicate that all other threads terminated.
There is a defunct processes left however:

carlo    18009  0.0  0.0     0     0 pts/3   Z    01:52   0:00 [threads_threads 
<defunct>]
carlo    17983  0.0  2.8 25804 10844 pts/3   T    01:52   0:01 
threads_threads_shared

These are all the 'threads_threads_shared' (the application name) that
are in the 'ps aux' output.

When I run the application from within gdb, then it terminates
normally.

Does someone have any idea what could be causing this? Or what
I can do/test to find out?

The application more or less does the following:
Multiple threads are running, the main thread calls pthread_join
for one thread at a time.  The thread often disable cancellation
but should always turn it on again later.  One thread calls raise(6),
all other threads call pthread_exit() and are 'joined' with the
main thread.  There might be mutexes left that are locked (although
that shouldn't matter of course).

For some reason, ALL threads are joined with the main thread
(the debug output gives:)

~/c++/libcwd/testsuite>threads_threads_shared 2>/dev/null | egrep 
'1024|creating'
1024       (0000) NOTICE  : main: creating thread 0, id 1026 (2).
1024       (0000) NOTICE  : main: creating thread 1, id 2051 (3).
1024       (0000) NOTICE  : main: creating thread 2, id 3076 (4).
1024       (0000) NOTICE  : main: creating thread 3, id 4101 (5).
1024       (0000) NOTICE  : main: creating thread 4, id 5126 (6).
1024       (0000) NOTICE  : main: creating thread 5, id 6151 (7).
1024       (0000) NOTICE  : main: creating thread 6, id 7176 (8).
1024       (0000) NOTICE  : main: creating thread 7, id 8201 (9).
1024       (0000) NOTICE  : main: creating thread 8, id 9226 (10).
1024       (0000) NOTICE  : main loop: thread 0, id 1026 (2), returned with 
status OK.
1024       (0000) NOTICE  : main loop: thread 1, id 2051 (3), returned with 
status OK.
1024       (0000) NOTICE  : main loop: thread 2, id 3076 (4), returned with 
status OK.
1024       (0000) NOTICE  : main loop: thread 3, id 4101 (5), returned with 
status OK.
1024       (0000) NOTICE  : main loop: thread 4, id 5126 (6), returned with 
status OK.
1024       (0000) NOTICE  : main loop: thread 5, id 6151 (7), returned with 
status OK.
1024       (0000) NOTICE  : main loop: thread 6, id 7176 (8), returned with 
status OK.
1024       (0000) NOTICE  : main loop: thread 7, id 8201 (9), returned with 
status OK.
1024       (0000) NOTICE  : main loop: thread 8, id 9226 (10), returned with 
status OK.
1024       (0000) NOTICE  : Exiting from main()

[ Where the main function looks like:

int main(void)
{
  Debug( check_configuration() );

#if CWDEBUG_ALLOC
  new int;
  libcw::debug::make_all_allocations_invisible_except(NULL);
#endif

  Debug( libcw_do.set_ostream(&std::cout, &cout_mutex) );

  set_margin();

  Debug( dc::notice.on() );
  Debug( libcw_do.on() );

  pthread_t thread_id[number_of_threads];
  for (int i = 0; i < number_of_threads; ++i)
  {
    Dout(dc::notice|continued_cf, "main: creating thread " << i << ", ");
    pthread_create(&thread_id[i], NULL, progs[i], NULL);
    Dout(dc::finish, "id " << thread_id[i] << " (" << 
thread_index(thread_id[i]) << ").");
  }

  for (int i = 0; i < number_of_threads; ++i)
  {
    void* status;
    pthread_join(thread_id[i], &status);
    Dout(dc::notice, "main loop: thread " << i << ", id " << thread_id[i]
        << " (" << thread_index(thread_id[i]) << "), returned with status "
        << ((bool)status ? "OK" : "ERROR") << '.');
  }

  Dout(dc::notice, "Exiting from main()");
  return 0;
}

]

The thread functions all end with 'return (void*)0' OR the function core_dump()
is called which looks like this:

    void core_dump(void)
    {
#ifdef _REENTRANT
      // Are we the first thread that tries to generate a core?
      LIBCWD_DISABLE_CANCEL;
      if (!_private_::mutex_tct<_private_::kill_threads_instance>::trylock())
      {
        LIBCWD_TSD_DECLARATION;
        __libcwd_tsd.internal = 0;      // Dunno if this is needed, but it 
looks consistant.
        ++__libcwd_tsd.library_call;;   // So our sanity checks allow us to 
call free() again in
                                        // pthread_exit when we get here from 
malloc et al.
        // Another thread is already trying to generate a core dump.
        pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
        pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
        pthread_exit(PTHREAD_CANCELED);
      }
      // Disable cancelation because otherwise it might be that another thread 
is generating the core.
#endif
      raise(6);
#ifdef _REENTRANT
      LIBCWD_ENABLE_CANCEL;
#endif
      exit(6);          // Never reached.
    }


Any help would be greatly appreciated.

-- 
Carlo Wood <address@hidden>

PS This is on i686-pc-linux-gnu, with glibc 2.2.4, running kernel 2.4.18.
   The compiler being used is g++ 3.0.4.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]