[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Libunwind-devel] Crash in libunwind 0.99 on x86_64
From: |
Don Maghrak |
Subject: |
Re: [Libunwind-devel] Crash in libunwind 0.99 on x86_64 |
Date: |
Tue, 20 Apr 2010 19:02:02 -0500 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 |
On 4/20/2010 4:25 PM, Lassi Tuura wrote:
Hi,
Thanks Arun. I suspect the git version would fix the crash I saw, if
it occurred at the exact same address in the first page again, however
I wanted to understand the likelihood of getting a crash at some other
place in the future.
The likelihood depends on the compiler version used, presence of handwritten
asm, third party libraries without proper unwind information etc.
There are better solutions possible (eg: implementing interfaces to query
valid stack addresses in the threading library you're using), but they require
modifications to other pieces of low level code (eg: libc).
We found that when interrupted by signals - such as sampling performance
profiler - it's essential to turn the validation always on. Otherwise sooner or
later the application will crash in libunwind accessing bad memory address.
Hi Lassi,
We find the same with regards to signals and callstack profiling
in the OpenSpeedShop tool. We typically patch src/x86_64/Gstep.c as the
systems we currently support typically crashed in access_mem. I understand
that the libunwind maintainers are concerned with performance issues
when validation is always on and maybe a configuration option to
force validation is needed to get such a patch applied (i.e.
src/x86_64/Ginit_local.c
setting c->validate = 1).
We turn validation on at the "Try DWARF-base unwinding..." in GStep.c:
*** libunwind-20100123/src/x86_64/Gstep.c 2010-02-08
11:34:10.000000000 -0500
--- libunwind-0.99-X/src/x86_64/Gstep.c 2009-05-12 15:28:27.000000000 -0500
***************
*** 39,44 ****
--- 39,47 ----
c, (unsigned long long) c->dwarf.ip);
/* Try DWARF-based unwinding... */
+ /* need to validate here too. Intel compiler generated code
+ * crashes with segv and sigbus on large mvapich jobs. */
+ c->validate = 1;
ret = dwarf_step (&c->dwarf);
if (ret < 0 && ret != -UNW_ENOINFO)
*** libunwind-20100123/src/x86_64/Gis_signal_frame.c 2010-02-08
11:34:10.000000000 -0500
--- libunwind-0.99-X/src/x86_64/Gis_signal_frame.c 2009-05-12
15:27:21.000000000 -0500
***************
*** 38,43 ****
--- 38,44 ----
void *arg;
int ret;
+ c->validate = 1;
as = c->dwarf.as;
a = unw_get_accessors (as);
arg = c->dwarf.as_arg;
This works for us on the nastiest cases we have seen (very large
simulation code at LLNL) and we do not see a noticeable performance hit
in the callstack profiler we use. That particular app would eventually
access bad memory attempting to unwind through Intel's fast memcpy routines.
We would also notice memory access crashes at high cpu counts when
profiling large mpi jobs. The above patch fixed the crashes (we still see
a very small number of truncated callstacks that are likely related to
other issues your patches appear to address).
We have also successfully profiled a large mpi benchmark (12000 cores)
on a cray-xt5 using libunwind with the above patch.
Thanks for your work on this!
regards,
Don
By far the biggest reason for this is inaccurate unwind information for
function epilogues - the exit paths from the function don't have any unwind
info, causing endless havoc if you happen to sample the stack there. There have
been a number of recent updates to GCC on this, but I am not sure if they all
made it even to 4.5.0 which was released just a few days ago. Anything before
4.5.0 is certainly prone to have significant issues of this sort.
GDB will also fail to produce a useful stack trace in comparable circumstances.
The fix needs to come from the compiler.
Similar caveats of course apply to debug info produced by other means. One
version of GLIBC I looked at has incorrect (manually entered) unwind info for
at least one function.
Regards,
Lassi
_______________________________________________
Libunwind-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/libunwind-devel