libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] unw_init_local() with a signal frame context.


From: Aliaksey Kandratsenka
Subject: Re: [Libunwind-devel] unw_init_local() with a signal frame context.
Date: Sat, 20 Dec 2014 16:42:25 -0800



On Sat, Dec 20, 2014 at 1:57 PM, Aliaksey Kandratsenka <address@hidden> wrote:


On Sat, Dec 20, 2014 at 8:24 AM, Aliaksey Kandratsenka <address@hidden> wrote:


On Sat, Dec 20, 2014 at 6:18 AM, Tim Deegan <address@hidden> wrote:
Hi all,

While looking into a report about problems with libunwind unwinding
from __lll_unlock_wake(), I noticed that I was getting some traces
that didn't unwind at all, from one particular address in
__lll_unlock_wake().

On closer inspection, the one IP that fails is the first instruction
in the function, and the failure is that c->use_pre_instr is set,
causing the unwinder to look up the byte just before the function.

I'm getting these traces from libprofiler/gperftools, which gets hold
of the interrupted context in its signal handler (as a ucontext_t *)
and passes it to unw_init_local() as its second argument.
unw_init_local() unconditionally sets use_pre_instr, which is correct
if the context came from unw_getcontext(), but not in this case.

Disabling this path in libprofiler (and switching to a slower one that
calls unw_getcontext() and then discards the first few frames) fixes
the problem.

Arguably this bug is in libprofiler, which ought not to be just
passing a ucontext_t * to unw_init_local (though I see that the man
page suggests this is OK for IA-64 at least).  But even so it would be
good if libprofiler could use the signal context it already has,
without setting use_prev_instr.

Hi. Thanks for reporting it. I was the guy who implemented that change in libprofiler to use signal's ucontext_t. We had some bug reports suggesting that occasionally libunwind crashes when unwinding across signal handler boundary. And my thought was that passing ucontext_t that is passed to signal handler would work around that. I.e. by pointing libunwind exactly at frame right before signal handler frame.

Most likely original bug was only relevant for older systems and I've not had any evidence that my change helps. So I'll simply revert that change in profiler.


But looks like that "use_prev_instr" thing is causing another interesting situation. E.g.: http://i.imgur.com/3rE7cpj.png (this is from pprof --web --addresses after I've effectively reverted "use signal's ucontext_t" patch)

This is caused by profiler always putting instruction pointer at top of stack trace (if it's different from what stacktrace capturing function returns). And smaller by one addresses actually point in the middle of instructions e.g:

  400631: 44 38 c2             cmp    %r8b,%dl
  400634: 7d f2                 jge    400628 <q6+0x68>
  400636: 48 89 c7             mov    %rax,%rdi
  400639: e8 32 ff ff ff       callq  400570 <swap>
  40063e: 4c 89 e6             mov    %r12,%rsi
  400641: 48 89 df             mov    %rbx,%rdi
  400644: e8 37 ff ff ff       callq  400580 <reverse>
  400649: 5b                   pop    %rbx

You can see for example that address 644 is "right" one and that is what top of stack contains (as noted above it's taken rip field from signal's ucontext) and first libunwind address is 643 which is off by one.

Even more interesting is that _all addresses_ appear to be off by one when returned from libunwind. And I'm seeing same behavior from glibc's backtrace() call (which is in turn using libgcc's _Unwind_backtrace facility).

I'll have to think a bit more how to deal with that "off by one" issue. Perhaps I'll simply extend existing "duplicate entry" logic to detect off by one case. Or try to detect that unwinder's behavior and adjust accordingly.

So perhaps there's indeed something to fix at libunwind end. Unfortunately I'm not at all familiar with stack unwinding and why use_prev_instr might be good idea. Can anybody share what might be reason for such behavior ?


Thanks for raising this again. I've found a number of issues in profiler code. "-1" behavior in non-top frames is caused by pprof. Given that it's all call instructions it should be ok. Perhaps they use it to highlight right instructions on assembly dump.

I'll be pushing a number of improvements as a result of that investigation soon. One of them is revert of patch that was originally mentioned in this thread.

Thanks a lot.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]