libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] Crash in libunwind 0.99 on x86_64


From: Don Maghrak
Subject: Re: [Libunwind-devel] Crash in libunwind 0.99 on x86_64
Date: Tue, 20 Apr 2010 19:02:02 -0500
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4

On 4/20/2010 4:25 PM, Lassi Tuura wrote:
Hi,

Thanks Arun. I suspect the git version would fix the crash I saw, if
it occurred at the exact same address in the first page again, however
I wanted to understand the likelihood of getting a crash at some other
place in the future.

The likelihood depends on the compiler version used, presence of handwritten 
asm, third party libraries without proper unwind information etc.

There are better solutions possible  (eg: implementing interfaces to query 
valid stack addresses in the threading library you're using), but they require 
modifications to other pieces of low level code (eg: libc).

We found that when interrupted by signals - such as sampling performance 
profiler - it's essential to turn the validation always on. Otherwise sooner or 
later the application will crash in libunwind accessing bad memory address.

Hi Lassi,
We find the same with regards to signals and callstack profiling
in the OpenSpeedShop tool. We typically patch src/x86_64/Gstep.c as the
systems we currently support typically crashed in access_mem. I understand
that the libunwind maintainers are concerned with performance issues
when validation is always on and maybe a configuration option to
force validation is needed to get such a patch applied (i.e. src/x86_64/Ginit_local.c
setting c->validate = 1).

We turn validation on at the "Try DWARF-base unwinding..." in GStep.c:
*** libunwind-20100123/src/x86_64/Gstep.c 2010-02-08 11:34:10.000000000 -0500
--- libunwind-0.99-X/src/x86_64/Gstep.c 2009-05-12 15:28:27.000000000 -0500
***************
*** 39,44 ****
--- 39,47 ----
         c, (unsigned long long) c->dwarf.ip);

    /* Try DWARF-based unwinding... */
+   /* need to validate here too.  Intel compiler generated code
+    * crashes with segv and sigbus on large mvapich jobs. */
+   c->validate = 1;
    ret = dwarf_step (&c->dwarf);

    if (ret < 0 && ret != -UNW_ENOINFO)
*** libunwind-20100123/src/x86_64/Gis_signal_frame.c 2010-02-08 11:34:10.000000000 -0500 --- libunwind-0.99-X/src/x86_64/Gis_signal_frame.c 2009-05-12 15:27:21.000000000 -0500
***************
*** 38,43 ****
--- 38,44 ----
    void *arg;
    int ret;

+   c->validate = 1;
    as = c->dwarf.as;
    a = unw_get_accessors (as);
    arg = c->dwarf.as_arg;

This works for us on the nastiest cases we have seen (very large
simulation code at LLNL) and we do not see a noticeable performance hit
in the callstack profiler we use. That particular app would eventually
access bad memory attempting to unwind through Intel's fast memcpy routines.
We would also notice memory access crashes at high cpu counts when
profiling large mpi jobs.  The above patch fixed the crashes (we still see
a very small number of truncated callstacks that are likely related to
other issues your patches appear to address).
We have also successfully profiled a large mpi benchmark (12000 cores)
on a cray-xt5 using libunwind with the above patch.

Thanks for your work on this!

regards,
Don

By far the biggest reason for this is inaccurate unwind information for 
function epilogues - the exit paths from the function don't have any unwind 
info, causing endless havoc if you happen to sample the stack there. There have 
been a number of recent updates to GCC on this, but I am not sure if they all 
made it even to 4.5.0 which was released just a few days ago. Anything before 
4.5.0 is certainly prone to have significant issues of this sort.

GDB will also fail to produce a useful stack trace in comparable circumstances. 
The fix needs to come from the compiler.

Similar caveats of course apply to debug info produced by other means. One 
version of GLIBC I looked at has incorrect (manually entered) unwind info for 
at least one function.

Regards,
Lassi

_______________________________________________
Libunwind-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/libunwind-devel





reply via email to

[Prev in Thread] Current Thread [Next in Thread]