libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] 10% lost unwind traces on x86-64?


From: Tianwei
Subject: Re: [Libunwind-devel] 10% lost unwind traces on x86-64?
Date: Tue, 9 Mar 2010 09:54:16 +0800

Hi, Lassi,
    I am also working on a profiling tool using libunwind, I also met the similar issue in signal-handler when I tried to extract the stack trace, my stacktrace is:
Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 0x7fff18d8b910 (LWP 2022)]
access_mem (as=0x7ffff63961a0, addr=8, val=0x7fff18d89548, write=0, arg=0x7fff18d89d90) at x86_64/Ginit.c:167
167           *val = *(unw_word_t *) addr;
(gdb) bt
#0  access_mem (as=0x7ffff63961a0, addr=8, val=0x7fff18d89548, write=0, arg=0x7fff18d89d90) at x86_64/Ginit.c:167
#1  0x00007ffff618ae00 in dwarf_get (c=<value optimized out>, rs=0x7fff18d89b00) at ../include/tdep-x86_64/libunwind_i.h:137
#2  apply_reg_state (c=<value optimized out>, rs=0x7fff18d89b00) at dwarf/Gparser.c:747
#3  0x00007ffff618cada in _Ux86_64_dwarf_find_save_locs (c=0x7fff18d89d90) at dwarf/Gparser.c:821
#4  0x00007ffff618d8b9 in _Ux86_64_dwarf_step (c=0x7ffff63961a0) at dwarf/Gstep.c:34
#5  0x00007ffff618f6fa in _Ux86_64_step (cursor=0x7ffff63961a0) at x86_64/Gstep.c:42
#6  0x00007ffff7bd80bb in get_backtrace (buffer=0x7fff18d8a6f0, n=10) at racez_pmuprofiler.cc:342
#7  0x00007ffff7bd819b in signal_handler (sig=29, info=0x7fff18d8a8b0, context=0x7fff18d8a780) at racez_pmuprofiler.cc:375
#8  <signal handler called>
#9  0x00007ffff69e91be in __tzfile_compute (timer=0, use_localtime=0, leap_correct=<value optimized out>,
    leap_hit=0x7fff18d8ac0c, tp=0x7fff18d8ac50) at tzfile.c:794
#10 0x00000000004c6d69 in explode_time (xt=Cannot access memory at address 0xffffffffffffffa8
) at time/unix/time.c:92
Backtrace stopped: previous frame inner to this frame (corrupt stack?)


If I turn off the stacktrace, the program runs very well, So I believe it's not the problem of my programs.  
My setting is:
ubuntu 9.10, X8664, 2.6.30 kernel with perfmon patch.

I do not know if my problem is same as yours, but I really do not know how to debug such problem. if you have some progress, I will try your patch ;-).

Tianwei
On Mon, Mar 8, 2010 at 11:24 PM, Lassi Tuura <address@hidden> wrote:
Hi,

I maintain a profiling utility which on x86-64 uses 6ms SIGPROF itimers and libunwind to sample the full stack trace for each tick. I would appreciate any help with the following issues, seen on RHEL5-derived systems.

About 10% of the signals can capture stack trace only one or few levels below the signal frame.  Some of those involve junk return addresses (e.g. 0xa6, or an address on heap or on stack), and some have impossible call stacks (function x calling y in impossible ways). So far all "bad" stacks I've investigated involve a truncated stack trace, i.e. it didn't finish in _start. I have not yet come across these problems when signals are not involved.

Does anyone have any words of wisdom why this occurs? I've been debugging this for some time, including decoding dwarf frame data and disassembly, but do not have a firm technical explanation yet. I have attached some example stacks, with annotations of which functions they are.

My interpretation so far is that __return_rt unwind info is correct, but stack trace is unreliable/unstable across the signal frame. I would appreciate input from anyone who knows for sure what the problems really are.

The problems I've seen so far:

- ~2.3% represent various math functions (isnan, __ieee754_log10 etc.) immediately above signal frame.

- ~2.0% ends in _init or __do_global_ctors_aux, i.e. calling shared library global constructors on dlopen().

- Several end at address@hidden immediately above the signal frame, i.e. PLT entry interrupted by signal is non-traceable.

- Several have bogus return address (e.g. < 0x4000, on heap or on stack) for the second frame above the signal frame. The few I investigated in detail also GDB reports incorrectly, but does manage to get the stack straightened plausibly in the end (see example below).

- Suspiciously large fraction of failures occur at (function+0), i.e. at function entry address.

These do not seem to explain everything, and I am not sure if they are supposed to be known flaws. Is it possible the unwind information is fundamentally inaccurate for async signals, and insufficient to arbitrarily trace the stack? Do some of the above ones represent fundamental known limitations?

FWIW, we don't have these problems on ia32 on the same system. We don't use libunwind there, just simple EBP frame following.

Examples below of various problems.

Regards,
Lassi

(a)

Stack trace using libunwind:
UW#1  : 0x2b619a5ffc5f issig=0
UW#2  : 0x2b619a603260 issig=0
UW#3  : 0x3067c0e930 issig=1
UW#4  : 0x2b619d08bbf3 issig=0
UW#5  : 0x0000126c
(SIGSEGV calling unw_is_signal_frame())

GDB back trace for this crash:

#4  <signal handler called>
#5  access_mem (as=0x2b619d118bc0, addr=4716, val=0x7fff6e571990, write=0,
   arg=0x7fff6e5719c0) at ../../../libunwind/src/x86_64/Ginit.c:167
#6  0x00002b619d1113c4 in _ULx86_64_is_signal_frame (
   cursor=<value optimized out>)
   at ../../../libunwind/src/x86_64/Gis_signal_frame.c:53
#7  0x00002b619a5ffd07 in IgHookTrace::stacktrace (
   addresses=<value optimized out>, nmax=800)
   at /afs/cern.ch/user/l/lat/dev/igprof/igprof/src/IgHookTrace.cc:250
#8
 0x00002b619a603260 in profileSignalHandler ()
   at /afs/cern.ch/user/l/lat/dev/igprof/igprof/src/IgProfPerf.cc:46
#9
 <signal handler called>
#10 0x00002b619d08bbf3 in std::string::assign (this=<value optimized out>,
  address@hidden)
   at /build/3jm/BUILD/slc5_amd64_gcc434/external/gcc/4.3.4/gcc-4.3.4/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:252
#11 0x000000000000126c in ?? ()
#12 0x0000000017e0a710 in ?? ()
#13 0x0000000017e013b0 in ?? ()
#14 0x0000000017df8050 in ?? ()
#15 0x0000000017e013b0 in ?? ()
#16 0x0000000017f768f0 in ?? ()
#17 0xffffffffffff9000 in ?? ()
#18 0xffffffffffffc800 in ?? ()
#19 0x0000000017e013b0 in ?? ()
#20 0x0000000017df8050 in ?? ()
#21 0x0000000017e0a710 in ?? ()
#22 0x000000000000126c in ?? ()
#23 0x0000000017f6d590 in ?? ()
#24 0x0000000017becd20 in ?? ()
#25 0x00002b619abbb8d3 in std::__stable_sort_adaptive<__gnu_cxx::__normal_iterator<edmplugin::PluginInfo*, std::vector<edmplugin::PluginInfo, std::allocator<edmplugin::PluginInfo> > >, edmplugin::PluginInfo*, long, edmplugin::(anonymous namespace)::CompPluginInfos> ()
  from /afs/cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
#26
0x00002b619abbba18 in edmplugin::CacheParser::read ()
  from /afs/cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
#27
0x00002b619abc77c7 in edmplugin::PluginManager::PluginManager ()
  from /afs/cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
#28
0x00002b619abc7f3e in edmplugin::PluginManager::configure ()
  from /afs/cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
#29
0x000000000040b591 in main ()

(b) Example truncated stack trace

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0 <_ZN11IgHookTrace10stacktraceEPPvi+53>
 UW#2  : 0x2b9997702260 issig=0 <profileSignalHandler+103>
 UW#3  : 0x3067c0e930 issig=1   <__restore_rt>
 UW#4  : 0x2b9999f91e60 issig=0 <type_new+0> Objects/typeobject.c:2045
 UWEND 0

(c) Example stack traces ending up with UNW_REG_IP on stack or heap.

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999fda981 issig=0 <cleanreturn+49> Python/getargs.c:191
 UW#5  : 0x7fff83a982c8 issig=0 [stack: 7fff83a71000-7fff83aae000]
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f671a2 issig=0 <lookdict+2> Objects/dictobject.c:342
 UW#5  : 0x1a2c6230 issig=0     [heap 19557000-1ab6f000]
 UWEND 0

(d) Various other failed stack traces, with functions labelled. Surrounding functions of stacks of ~50 levels deep, and in any case end up in _start.

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f1b520 issig=0 <address@hidden>
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f2afd2 issig=0 <PyObject_IsSubclass+2> Objects/abstract.c:2975
 UW#5  : 0x1a234590 issig=0     [heap 19557000-1ab6f000]
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f81fd0 issig=0 <tupledealloc+0> Objects/tupleobject.c:162
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6be3c issig=0 <PyObject_GenericGetAttr+140> Objects/object.c:1441
 UW#5  : 0x7fff83a9ac70 issig=0 [stack: 7fff83a71000-7fff83aae000]
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6e2b0 issig=0 <PyObject_Free+0> Objects/obmalloc.c:927
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6a010 issig=0 <meth_dealloc+0> Objects/methodobject.c:132
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f1edf0 issig=0 <address@hidden>
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6bdb0 issig=0 <PyObject_GenericGetAttr+0> Objects/object.c:1314
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f6c250 issig=0 <PyObject_GetAttr+0> Objects/object.c:1173
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f1d130 issig=0 <address@hidden>
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f43750 issig=0 <getset_get+0> Objects/descrobject.c:140
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x3067029ef0 issig=0 <__ctype_b_loc>
 UWEND 0

UWTRACE:
 UW#1  : 0x2b99976fec5f issig=0
 UW#2  : 0x2b9997702260 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x2b9999f88280 issig=0 <_PyType_Lookup+288> Objects/typeobject.c:2514
 UW#5  : 0x00000001
Program received signal SIGSEGV, Segmentation fault.
access_mem (as=0x2b999a217bc0, addr=1, val=0x7fff83a9ee40, write=0, arg=0x7fff83a9ee70) at ../../../libunwind/src/x86_64/Ginit.c:167
167           *val = *(unw_word_t *) addr;

(e) An example stack where issig=-9999 means I omitted the check because of a suspicious address. It looks like apart from the bogus #5 entry the stack tracing is plausible and correct.

UWTRACE:
 UW#1  : 0x2b0062ec1c5f issig=0
 UW#2  : 0x2b0062ec5270 issig=0
 UW#3  : 0x3067c0e930 issig=1
 UW#4  : 0x3067040672 issig=0
 UW#5  : 0x0000002e issig=-9999
 UW#6  : 0x3067044ad6 issig=0
 UW#7  : 0x306706988a issig=0
 UW#8  : 0x2b006592d95f issig=0
 UW#9  : 0x2b0065931373 issig=0
 UW#10 : 0x2b00659316ba issig=0
 UW#11 : 0x2b0065944bce issig=0
 UW#12 : 0x2b00632e7870 issig=0
 UW#13 : 0x2b006328fe96 issig=0
 UW#14 : 0x2b006313a907 issig=0
 UW#15 : 0x2b0063140f04 issig=0
 UW#16 : 0x2b0063548d1b issig=0
 UW#17 : 0x2b0063549138 issig=0
 UW#18 : 0x2b00635518d3 issig=0
 UW#19 : 0x2b00635466a6 issig=0
 UW#20 : 0x2b00656ea4a8 issig=0
 UW#21 : 0x2b006578d4e2 issig=0
 UW#22 : 0x2b006579032a issig=0
 UW#23 : 0x2b006579032a issig=0
 UW#24 : 0x2b006579032a issig=0
 UW#25 : 0x2b006579032a issig=0
 UW#26 : 0x2b006579032a issig=0
 UW#27 : 0x2b0065790d69 issig=0
 UW#28 : 0x2b0065790e82 issig=0
 UW#29 : 0x2b00657aea2c issig=0
 UW#30 : 0x2b00631326c9 issig=0
 UW#31 : 0x2b00631328e3 issig=0
 UW#32 : 0x2b0063132be7 issig=0
 UW#33 : 0x2b0063127cfb issig=0
 UW#34 : 0x0040ccfa issig=0
 UW#35 : 0x306701d994 issig=0
 UW#36 : 0x0040b2e9 issig=0
 UWEND 0



_______________________________________________
Libunwind-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/libunwind-devel



--
Sheng, Tianwei
Inst. of High Performance Computing
Dept. of Computer Sci. & Tech.
Tsinghua Univ.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]