libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] Crash in libunwind 0.99 on x86_64


From: Lassi Tuura
Subject: Re: [Libunwind-devel] Crash in libunwind 0.99 on x86_64
Date: Wed, 21 Apr 2010 08:47:13 +0200

Hi Don,

> We find the same with regards to signals and callstack profiling
> in the OpenSpeedShop tool. We typically patch src/x86_64/Gstep.c as the
> systems we currently support typically crashed in access_mem. I understand
> that the libunwind maintainers are concerned with performance issues
> when validation is always on and maybe a configuration option to
> force validation is needed to get such a patch applied (i.e. 
> src/x86_64/Ginit_local.c
> setting c->validate = 1).

Right, I know, but for us x86-64 performance is so far from what we
need that I am working on another patch to gain performance in
a different way in any case. On the other hand without validation
our user experience is dreadful - less than one run in 10 on a
largish application escapes death in access_me. I did try more
selective validation but it didn't work for us.

I'll post the proto fast trace patch for discussion soon, see
http://thread.gmane.org/gmane.comp.lib.unwind.devel/480 for the
initial discussion on the subject.

Your fix also turns validation always on x86-64, you just do
it in the two code locations that follow unw_init_local/remote
(unw_step() and unw_is_signal_frame())?

Lassi

> 
> We turn validation on at the "Try DWARF-base unwinding..." in GStep.c:
> *** libunwind-20100123/src/x86_64/Gstep.c       2010-02-08 11:34:10.000000000 
> -0500
> --- libunwind-0.99-X/src/x86_64/Gstep.c 2009-05-12 15:28:27.000000000 -0500
> ***************
> *** 39,44 ****
> --- 39,47 ----
>         c, (unsigned long long) c->dwarf.ip);
> 
>    /* Try DWARF-based unwinding... */
> +   /* need to validate here too.  Intel compiler generated code
> +    * crashes with segv and sigbus on large mvapich jobs. */
> +   c->validate = 1;
>    ret = dwarf_step (&c->dwarf);
> 
>    if (ret < 0 && ret != -UNW_ENOINFO)
> *** libunwind-20100123/src/x86_64/Gis_signal_frame.c    2010-02-08 
> 11:34:10.000000000 -0500
> --- libunwind-0.99-X/src/x86_64/Gis_signal_frame.c      2009-05-12 
> 15:27:21.000000000 -0500
> ***************
> *** 38,43 ****
> --- 38,44 ----
>    void *arg;
>    int ret;
> 
> +   c->validate = 1;
>    as = c->dwarf.as;
>    a = unw_get_accessors (as);
>    arg = c->dwarf.as_arg;
> 
> This works for us on the nastiest cases we have seen (very large
> simulation code at LLNL) and we do not see a noticeable performance hit
> in the callstack profiler we use. That particular app would eventually
> access bad memory attempting to unwind through Intel's fast memcpy routines.
> We would also notice memory access crashes at high cpu counts when
> profiling large mpi jobs.  The above patch fixed the crashes (we still see
> a very small number of truncated callstacks that are likely related to
> other issues your patches appear to address).
> We have also successfully profiled a large mpi benchmark (12000 cores)
> on a cray-xt5 using libunwind with the above patch.
> 
> Thanks for your work on this!
> 
> regards,
> Don
> 
>> By far the biggest reason for this is inaccurate unwind information for 
>> function epilogues - the exit paths from the function don't have any unwind 
>> info, causing endless havoc if you happen to sample the stack there. There 
>> have been a number of recent updates to GCC on this, but I am not sure if 
>> they all made it even to 4.5.0 which was released just a few days ago. 
>> Anything before 4.5.0 is certainly prone to have significant issues of this 
>> sort.
>> 
>> GDB will also fail to produce a useful stack trace in comparable 
>> circumstances. The fix needs to come from the compiler.
>> 
>> Similar caveats of course apply to debug info produced by other means. One 
>> version of GLIBC I looked at has incorrect (manually entered) unwind info 
>> for at least one function.
>> 
>> Regards,
>> Lassi
>> 
>> _______________________________________________
>> Libunwind-devel mailing list
>> address@hidden
>> http://lists.nongnu.org/mailman/listinfo/libunwind-devel
> 
> 
> 
> _______________________________________________
> Libunwind-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/libunwind-devel
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]