libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Libunwind-devel] 10% lost unwind traces on x86-64?


From: Lassi Tuura
Subject: [Libunwind-devel] 10% lost unwind traces on x86-64?
Date: Mon, 8 Mar 2010 16:24:06 +0100

Hi,

I maintain a profiling utility which on x86-64 uses 6ms SIGPROF itimers and 
libunwind to sample the full stack trace for each tick. I would appreciate any 
help with the following issues, seen on RHEL5-derived systems.

About 10% of the signals can capture stack trace only one or few levels below 
the signal frame.  Some of those involve junk return addresses (e.g. 0xa6, or 
an address on heap or on stack), and some have impossible call stacks (function 
x calling y in impossible ways). So far all "bad" stacks I've investigated 
involve a truncated stack trace, i.e. it didn't finish in _start. I have not 
yet come across these problems when signals are not involved.

Does anyone have any words of wisdom why this occurs? I've been debugging this 
for some time, including decoding dwarf frame data and disassembly, but do not 
have a firm technical explanation yet. I have attached some example stacks, 
with annotations of which functions they are.

My interpretation so far is that __return_rt unwind info is correct, but stack 
trace is unreliable/unstable across the signal frame. I would appreciate input 
from anyone who knows for sure what the problems really are.

The problems I've seen so far:

- ~2.3% represent various math functions (isnan, __ieee754_log10 etc.) 
immediately above signal frame.

- ~2.0% ends in _init or __do_global_ctors_aux, i.e. calling shared library 
global constructors on dlopen().

- Several end at address@hidden immediately above the signal frame, i.e. PLT 
entry interrupted by signal is non-traceable.

- Several have bogus return address (e.g. < 0x4000, on heap or on stack) for 
the second frame above the signal frame. The few I investigated in detail also 
GDB reports incorrectly, but does manage to get the stack straightened 
plausibly in the end (see example below).

- Suspiciously large fraction of failures occur at (function+0), i.e. at 
function entry address.

These do not seem to explain everything, and I am not sure if they are supposed 
to be known flaws. Is it possible the unwind information is fundamentally 
inaccurate for async signals, and insufficient to arbitrarily trace the stack? 
Do some of the above ones represent fundamental known limitations?

FWIW, we don't have these problems on ia32 on the same system. We don't use 
libunwind there, just simple EBP frame following.

Examples below of various problems.

Regards,
Lassi

(a)

Stack trace using libunwind:
UW#1  : 0x2b619a5ffc5f issig=0
UW#2  : 0x2b619a603260 issig=0
UW#3  : 0x3067c0e930 issig=1
UW#4  : 0x2b619d08bbf3 issig=0
UW#5  : 0x0000126c
(SIGSEGV calling unw_is_signal_frame())

GDB back trace for this crash:

#4  <signal handler called>
#5  access_mem (as=0x2b619d118bc0, addr=4716, val=0x7fff6e571990, write=0, 
    arg=0x7fff6e5719c0) at ../../../libunwind/src/x86_64/Ginit.c:167
#6  0x00002b619d1113c4 in _ULx86_64_is_signal_frame (
    cursor=<value optimized out>)
    at ../../../libunwind/src/x86_64/Gis_signal_frame.c:53
#7  0x00002b619a5ffd07 in IgHookTrace::stacktrace (
    addresses=<value optimized out>, nmax=800)
    at /afs/cern.ch/user/l/lat/dev/igprof/igprof/src/IgHookTrace.cc:250
#8  0x00002b619a603260 in profileSignalHandler ()
    at /afs/cern.ch/user/l/lat/dev/igprof/igprof/src/IgProfPerf.cc:46
#9  <signal handler called>
#10 0x00002b619d08bbf3 in std::string::assign (this=<value optimized out>, 
    address@hidden)
    at 
/build/3jm/BUILD/slc5_amd64_gcc434/external/gcc/4.3.4/gcc-4.3.4/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:252
#11 0x000000000000126c in ?? ()
#12 0x0000000017e0a710 in ?? ()
#13 0x0000000017e013b0 in ?? ()
#14 0x0000000017df8050 in ?? ()
#15 0x0000000017e013b0 in ?? ()
#16 0x0000000017f768f0 in ?? ()
#17 0xffffffffffff9000 in ?? ()
#18 0xffffffffffffc800 in ?? ()
#19 0x0000000017e013b0 in ?? ()
#20 0x0000000017df8050 in ?? ()
#21 0x0000000017e0a710 in ?? ()
#22 0x000000000000126c in ?? ()
#23 0x0000000017f6d590 in ?? ()
#24 0x0000000017becd20 in ?? ()
#25 0x00002b619abbb8d3 in 
std::__stable_sort_adaptive<__gnu_cxx::__normal_iterator<edmplugin::PluginInfo*,
 std::vector<edmplugin::PluginInfo, std::allocator<edmplugin::PluginInfo> > >, 
edmplugin::PluginInfo*, long, edmplugin::(anonymous 
namespace)::CompPluginInfos> ()
   from 
/afs/cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
#26 0x00002b619abbba18 in edmplugin::CacheParser::read ()
   from 
/afs/cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
#27 0x00002b619abc77c7 in edmplugin::PluginManager::PluginManager ()
   from 
/afs/cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
#28 0x00002b619abc7f3e in edmplugin::PluginManager::configure ()
   from 
/afs/cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
#29 0x000000000040b591 in main ()

(b) Example truncated stack trace

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0 <_ZN11IgHookTrace10stacktraceEPPvi+53>
 UW#2  : 0x2b9997702260 issig=0 <profileSignalHandler+103>
 UW#3  : 0x3067c0e930 issig=1   <__restore_rt>
 UW#4  : 0x2b9999f91e60 issig=0 <type_new+0> Objects/typeobject.c:2045
 UWEND 0

(c) Example stack traces ending up with UNW_REG_IP on stack or heap.

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999fda981 issig=0 <cleanreturn+49> Python/getargs.c:191
 UW#5  : 0x7fff83a982c8 issig=0 [stack: 7fff83a71000-7fff83aae000]
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f671a2 issig=0 <lookdict+2> Objects/dictobject.c:342
 UW#5  : 0x1a2c6230 issig=0     [heap 19557000-1ab6f000]
 UWEND 0

(d) Various other failed stack traces, with functions labelled. Surrounding 
functions of stacks of ~50 levels deep, and in any case end up in _start.

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f1b520 issig=0 <address@hidden>
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f2afd2 issig=0 <PyObject_IsSubclass+2> Objects/abstract.c:2975
 UW#5  : 0x1a234590 issig=0     [heap 19557000-1ab6f000]
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f81fd0 issig=0 <tupledealloc+0> Objects/tupleobject.c:162
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6be3c issig=0 <PyObject_GenericGetAttr+140> 
Objects/object.c:1441
 UW#5  : 0x7fff83a9ac70 issig=0 [stack: 7fff83a71000-7fff83aae000]
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6e2b0 issig=0 <PyObject_Free+0> Objects/obmalloc.c:927
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6a010 issig=0 <meth_dealloc+0> Objects/methodobject.c:132
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f1edf0 issig=0 <address@hidden>
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6bdb0 issig=0 <PyObject_GenericGetAttr+0> 
Objects/object.c:1314
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6c250 issig=0 <PyObject_GetAttr+0> Objects/object.c:1173
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f1d130 issig=0 <address@hidden>
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f43750 issig=0 <getset_get+0> Objects/descrobject.c:140
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x3067029ef0 issig=0 <__ctype_b_loc>
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f88280 issig=0 <_PyType_Lookup+288> Objects/typeobject.c:2514
 UW#5  : 0x00000001
Program received signal SIGSEGV, Segmentation fault.
access_mem (as=0x2b999a217bc0, addr=1, val=0x7fff83a9ee40, write=0, 
arg=0x7fff83a9ee70) at ../../../libunwind/src/x86_64/Ginit.c:167
167           *val = *(unw_word_t *) addr;

(e) An example stack where issig=-9999 means I omitted the check because of a 
suspicious address. It looks like apart from the bogus #5 entry the stack 
tracing is plausible and correct.

UWTRACE:
 UW#1  : 0x2b0062ec1c5f issig=0
 UW#2  : 0x2b0062ec5270 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x3067040672 issig=0
 UW#5  : 0x0000002e issig=-9999
 UW#6  : 0x3067044ad6 issig=0
 UW#7  : 0x306706988a issig=0
 UW#8  : 0x2b006592d95f issig=0
 UW#9  : 0x2b0065931373 issig=0
 UW#10 : 0x2b00659316ba issig=0
 UW#11 : 0x2b0065944bce issig=0
 UW#12 : 0x2b00632e7870 issig=0
 UW#13 : 0x2b006328fe96 issig=0
 UW#14 : 0x2b006313a907 issig=0
 UW#15 : 0x2b0063140f04 issig=0
 UW#16 : 0x2b0063548d1b issig=0
 UW#17 : 0x2b0063549138 issig=0
 UW#18 : 0x2b00635518d3 issig=0
 UW#19 : 0x2b00635466a6 issig=0
 UW#20 : 0x2b00656ea4a8 issig=0
 UW#21 : 0x2b006578d4e2 issig=0
 UW#22 : 0x2b006579032a issig=0
 UW#23 : 0x2b006579032a issig=0
 UW#24 : 0x2b006579032a issig=0
 UW#25 : 0x2b006579032a issig=0
 UW#26 : 0x2b006579032a issig=0
 UW#27 : 0x2b0065790d69 issig=0
 UW#28 : 0x2b0065790e82 issig=0
 UW#29 : 0x2b00657aea2c issig=0
 UW#30 : 0x2b00631326c9 issig=0
 UW#31 : 0x2b00631328e3 issig=0
 UW#32 : 0x2b0063132be7 issig=0
 UW#33 : 0x2b0063127cfb issig=0
 UW#34 : 0x0040ccfa issig=0
 UW#35 : 0x306701d994 issig=0
 UW#36 : 0x0040b2e9 issig=0
 UWEND 0





reply via email to

[Prev in Thread] Current Thread [Next in Thread]