|
From: | Siddharth Bhojnagarwala |
Subject: | [Libunwind-devel] Crash while running a cpu profiling tool with libunwind 1.0.1 |
Date: | Fri, 27 Jan 2012 01:03:00 +0000 |
Hello, I am trying to use a cpu profiling tool (google perftool) which uses libunwind to get backtraces. The code that is being profiled takes mutex locks all over the place. When the profile is run, it crashes instantaneously (generally with
some kind of illegal instruction). See an example of crash below. myhost# gdb /root/asp/bin/myexec core_7144_1327624664_myprogram
GNU gdb (GDB) 7.3.1 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "mips64-nlm-linux". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /root/asp/bin/myexec...done. warning: core file may not match specified executable file. … [Thread debugging using libthread_db enabled] Core was generated by `/root/asp/bin/myexec'. Program terminated with signal 4, Illegal instruction. #0 0x0000005556c9d3bc in __sigprocmask (how=3, set=0x5589f15bf8, oset=0x0) at ../sysdeps/unix/sysv/linux/sigprocmask.c:66 66 ../sysdeps/unix/sysv/linux/sigprocmask.c: No such file or directory. in ../sysdeps/unix/sysv/linux/sigprocmask.c (gdb) bt #0 0x0000005556c9d3bc in __sigprocmask (how=3, set=0x5589f15bf8, oset=0x0) at ../sysdeps/unix/sysv/linux/sigprocmask.c:66 #1 0x000000555783cf10 in put_rs_cache () from /anroot/projects/tos_3party/.target/mips64-nlm-linux/lib/libunwind.so.8 #2 0x000000555783dfb4 in _ULmips_dwarf_find_save_locs () from /anroot/projects/tos_3party/.target/mips64-nlm-linux/lib/libunwind.so.8 #3 0x000000555783ecc8 in _ULmips_dwarf_step () from /anroot/projects/tos_3party/.target/mips64-nlm-linux/lib/libunwind.so.8 #4 0x0000005557837084 in _ULmips_step () from /anroot/projects/tos_3party/.target/mips64-nlm-linux/lib/libunwind.so.8 #5 0x00000055556ae04c in GetStackTraceWithContext(void**, int, int, void const*) () from /opt/thoroughbred/lib/libtcmalloc.so.0 #6 0x00000055557a6ce4 in ?? () from /opt/thoroughbred/lib/libprofiler.so.0 #7 0x00000055557a90e8 in ProfileHandler::SignalHandler(int, siginfo*, void*) () from /opt/thoroughbred/lib/libprofiler.so.0 #8 <signal handler called> #9 0x000000555711cc44 in __lll_trylock (futex=<optimized out>) at ../ports/sysdeps/unix/sysv/linux/mips/nptl/lowlevellock.h:137 #10 __pthread_mutex_trylock (mutex=0x555ecda230) at pthread_mutex_trylock.c:65 … #16 0x00000055571198c8 in start_thread (arg=<optimized out>) at pthread_create.c:299 #17 0x0000005556d50bbc in __thread_start () from /opt/thoroughbred/lib/libc.so.6 The Google Perftool README recognizes this problem. Here is what it says. … while tcmalloc itself works fine, the cpu-profiler tool is unreliable: it will sometimes work, but sometimes cause a segfault. I'll explain the problem first, and then some workarounds. Note that this only affects the cpu-profiler, which is a google-perftools feature you must turn on manually by setting the CPUPROFILE environment variable. If you do not turn on cpu-profiling, you shouldn't see any crashes due to perftools. The gory details: The underlying problem is in the backtrace() function, which is a built-in function in libc. Backtracing is fairly straightforward in the normal case, but can run into problems when having to backtrace across a signal frame. Unfortunately, the cpu-profiler uses signals in order to register a profiling event, so every backtrace that the profiler does crosses a signal frame. In our experience, the only time there is trouble is when the signal fires in the middle of pthread_mutex_lock. pthread_mutex_lock is called quite a bit from system libraries, particularly at program startup and when creating a new thread. The solution: The dwarf debugging format has support for 'cfi annotations', which make it easy to recognize a signal frame. Some OS distributions, such as Fedora and gentoo 2007.0, already have added cfi annotations to their libc. A future version of libunwind should recognize these annotations; these systems should not see any crashses. Why does libunwind choke if a signal to do profiling fires in middle of pthread_mutex_lock? I am also not clear on the solution that gperf offers, can someone please advise me further on that? Regards, Sid |
[Prev in Thread] | Current Thread | [Next in Thread] |