Re: [Libunwind-devel] gold linker completely breaks libunwind

libunwind-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] gold linker completely breaks libunwind

From:	Peter Wu
Subject:	Re: [Libunwind-devel] gold linker completely breaks libunwind
Date:	Tue, 11 Aug 2015 12:45:43 +0200
User-agent:	Mutt/1.5.23+64 (e44d7de51fa8) (2014-03-12)

On Mon, Aug 10, 2015 at 06:42:18PM -0700, Arun Sharma wrote:
> On Mon, Aug 10, 2015 at 4:34 PM, Peter Wu <address@hidden> wrote:
> 
> > FAILURE: unw_step() returned -1 for ip=7fe2c2cbf6d8
> > FAIL test-async-sig (exit status: 139)
> >
> > Will further investigate it. Note that I also tried an ASAN build which
> > has some other failure modes (even with the patch reverted). LSan hangs
> > somewhere because it tries to report an issue and then calls
> > libunwind...
> 
> Peter: a large part of your patch should've been a no-op. The only
> logic change was in the bounds check. Reverting just that part seems
> like the easiest thing to do while you investigate.
> 
> +      /* CIE must be within the segment. */
> +      if (cie_offset_addr < base)
> +          return -UNW_ENOINFO;
> 
> Could you verify that reverting these lines makes the crashes go away?

Disabling the above lines make the tests pass, but that reintroduces the
issue[1] that it was supposed to fix.

Investigation of Gtest-resume-sig, the gold issue affects libunwind.so.
A combination of Gtest-resume-sig with a non-gold-linked libunwind.so
does not show the problem.

Good debug output for Gtest-resume-sig (no gold):

 >_Ux86_64_init_mem_validate: using msync to validate memory
 >_Ux86_64_init_local: (cursor=0x7fffffffd3f0)
 >_Ux86_64_step: (cursor=0x7fffffffd3f0, ip=0x0000000000400d2a, 
 >cfa=0x00007fffffffd3d0)
 >_Ux86_64_dwarf_search_unwind_table: e->fde_offset = d8, segbase = 4013d8, 
 >debug_frame_base = 0, fde_addr = 4014b0
 >_Ux86_64_step: (cursor=0x7fffffffd3f0, ip=0x00007ffff76145b0, 
 >cfa=0x00007fffffffdcc0)
 >_Ux86_64_dwarf_search_unwind_table: e->fde_offset = 7ed0, segbase = 
 >7ffff774ff30, debug_frame_base = 0, fde_addr = 7ffff7757e00
 >_Ux86_64_resume: (cursor=0x7fffffffd3f0)
 >_Ux86_64_dwarf_search_unwind_table: e->fde_offset = 7fa8, segbase = 
 >7ffff774ff30, debug_frame_base = 0, fde_addr = 7ffff7757ed8


Bad debug output for Gtest-resume-sig (gold):

 >_Ux86_64_init_mem_validate: using msync to validate memory
 >_Ux86_64_init_local: (cursor=0x7fffffffd7f0)
 >_Ux86_64_step: (cursor=0x7fffffffd7f0, ip=0x00000000004011d1, 
 >cfa=0x00007fffffffd210)
 >_Ux86_64_dwarf_search_unwind_table: e->fde_offset = fffffffffffffeb4, segbase 
 >= 4020ec, debug_frame_base = 0, fde_addr = 401fa0
 >_Ux86_64_step: [RBP=0x6] = 0x7fffffffdc30 (cfa = 0x7fffffffd210) -> 
 >0x7fffffffe1d0
 >_Ux86_64_step: Frame Chain [RIP=0x7fffffffdc38] = 0x7ffff64a05b0
 >_Ux86_64_step: (cursor=0x7fffffffd7f0, ip=0x00007ffff64a05b0, 
 >cfa=0x00007fffffffd220)
 >_Ux86_64_dwarf_search_unwind_table: e->fde_offset = 7ed0, segbase = 
 >7ffff65dbf30, debug_frame_base = 0, fde_addr = 7ffff65e3e00
 >_Ux86_64_resume: (cursor=0x7fffffffd7f0)

Segfault happens because a different bug. From lines 123-128:

      if ((ret = unw_get_reg (&c, UNW_REG_IP, &ip)) < 0)
        panic ("unw_get_reg(IP) failed: ret=%d\n", ret);
      if (verbose)
        printf ("resuming at 0x%lx, with SIGUSR2 pending\n",
            (unsigned long) ip);
      unw_resume (&c);

Somehow c == 0 while ret >= 0. That seems bogus...


The gold-linked libunwind, but with the suspicious branch disabled:

 >_Ux86_64_init_mem_validate: using msync to validate memory
 >_Ux86_64_init_local: (cursor=0x7fffffffd7f0)
 >_Ux86_64_step: (cursor=0x7fffffffd7f0, ip=0x00000000004011d1, 
 >cfa=0x00007fffffffd210)
 >_Ux86_64_dwarf_search_unwind_table: e->fde_offset = fffffffffffffeb4, segbase 
 >= 4020ec, debug_frame_base = 0, fde_addr = 401fa0

Breakpoint 8, _Ux86_64_dwarf_extract_proc_info_from_fde (as=0x7ffff6e5f4e0 
<local_addr_space>, a=0x7ffff6e5f4e0 <local_addr_space>, addrp=0x7fffffffc9d8, 
pi=0x7fffffffd950, 
    base=4202732, need_unwind_info=1, is_debug_frame=0, arg=0x7fffffffd7f0) at 
../../src/dwarf/Gfde.c:256
256           if (cie_offset_addr < base)
(gdb) p base=0
$10 = 0
(gdb) c
Continuing.
 >_Ux86_64_step: (cursor=0x7fffffffd7f0, ip=0x00007ffff64a05b0, 
 >cfa=0x00007fffffffdc40)
 >_Ux86_64_dwarf_search_unwind_table: e->fde_offset = 7ed0, segbase = 
 >7ffff65dbf30, debug_frame_base = 0, fde_addr = 7ffff65e3e00
 >_Ux86_64_resume: (cursor=0x7fffffffd7f0)
 >_Ux86_64_dwarf_search_unwind_table: e->fde_offset = 7fa8, segbase = 
 >7ffff65dbf30, debug_frame_base = 0, fde_addr = 7ffff65e3ed8


And here is a trace for the program from the original bug:
 >_ULx86_64_step: (cursor=0x7fffffffd420, ip=0x00007ffff7bb347b, 
 >cfa=0x00007fffffffddd0)
 >_ULx86_64_dwarf_search_unwind_table: e->fde_offset = 7b3c, segbase = 
 >7ffff7bcd614, debug_frame_base = 0, fde_addr = 7ffff7bd5150
g_signal_emit
 >_ULx86_64_step: (cursor=0x7fffffffd420, ip=0x00007ffff7bb37af, 
 >cfa=0x00007fffffffdfb0)
 >_ULx86_64_dwarf_search_unwind_table: e->fde_offset = 7b6c, segbase = 
 >7ffff7bcd614, debug_frame_base = 0, fde_addr = 7ffff7bd5180
monitor_event
 >_ULx86_64_step: (cursor=0x7fffffffd420, ip=0x00007ffff7fef224, 
 >cfa=0x00007fffffffe090)
 >_ULx86_64_dwarf_search_unwind_table: e->fde_offset = fffffffffffff150, 
 >segbase = 7ffff7ff3970, debug_frame_base = 0, fde_addr = 7ffff7ff2ac0

Breakpoint 1, _ULx86_64_dwarf_extract_proc_info_from_fde (as=0x7ffff786e980 
<local_addr_space>, a=0x7ffff786e980 <local_addr_space>, addrp=0x7fffffffcba8, 
pi=0x7fffffffd580, base=140737354086768, need_unwind_info=1, is_debug_frame=0, 
arg=0x7fffffffd420) at ../../src/dwarf/Gfde.c:256
256           if (cie_offset_addr < base)
(gdb) p/x cie_offset_addr
$2 = 0x7ffff7ff2ac4
(gdb) p/x base
$3 = 0x7ffff7ff3970
(gdb) p base=0
$4 = 0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff765e05c in dwarf_readu32 (as=0x7ffff786e980 <local_addr_space>, 
a=0x7ffff786e980 <local_addr_space>, addr=0x7fffffffc998, val=0x7fffffffc9cc, 
arg=0x7fffffffd420) at ../../include/dwarf_i.h:117
117       *val = mvp->u32;

Without skipping the check, this would also be visible without crashes:
 >_ULx86_64_step: [RBP=0x7fffffffdfa0] = 0x6371c0 (cfa = 0x7fffffffe090) -> 
 >0x609ca0


Would not using the gold linker for libunwind be an acceptable
workaround for now, until the issue is fixed? I need some more time to
investigate this (unless someone else beats me to it). If you use need
to use gold, then you can try to remove that line, but note that it can
cause crashes if you encounter a corrupted binary (binaries linked using
gold <2.25 with LTO enabled).
-- 
Kind regards,
Peter Wu
https://lekensteyn.nl

 [1]: http://lists.nongnu.org/archive/html/libunwind-devel/2014-11/msg00009.html

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Libunwind-devel] gold linker completely breaks libunwind, Arun Sharma, 2015/08/08
- Re: [Libunwind-devel] gold linker completely breaks libunwind, Peter Wu, 2015/08/10
  - Re: [Libunwind-devel] gold linker completely breaks libunwind, Arun Sharma, 2015/08/10
    - Re: [Libunwind-devel] gold linker completely breaks libunwind, Peter Wu <=

Prev by Date: Re: [Libunwind-devel] gold linker completely breaks libunwind
Previous by thread: Re: [Libunwind-devel] gold linker completely breaks libunwind
Index(es):
- Date
- Thread