bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sigsegv during exit()


From: Alaric B Snell
Subject: sigsegv during exit()
Date: Mon, 05 Jan 2004 10:56:11 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030704 Debian/1.4-1


Hello!

My application runs to completion, then dies in:

#0  0x40198aab in __register_atfork () from /lib/libc.so.6
#1  0x400e0b4f in __cxa_finalize () from /lib/libc.so.6
#2  0x4002b250 in __do_global_dtors_aux () from /lib/libpthread.so.0
#3  0x40033d35 in _fini () from /lib/libpthread.so.0
#4  0x4000c4e6 in _dl_init () from /lib/ld-linux.so.2
#5  0x400e08f2 in exit () from /lib/libc.so.6
#6  0x400cadae in __libc_start_main () from /lib/libc.so.6

Now, tracing through with gdb is a bit messy since my build of glibc lacks debugging symbols (and I'm not sure how to get one WITH debugging symbols in under Debian...), but I notice:

(gdb) break exit
Breakpoint 1 at 0x8048f68
(gdb) run
Starting program: /guest2/alaric/ARGON/HYDROGEN.interp/hydrogen words.hydrogen test.hydrogen
[New Thread 16384 (LWP 12619)]
Breakpoint 1 at 0x400e0896
[Switching to Thread 16384 (LWP 12619)]

Breakpoint 1, 0x400e0896 in exit () from /lib/libc.so.6
(gdb) n
Single stepping until exit from function exit,
which has no line number information.
0x400cabfd in _nc_tracing () from /lib/libc.so.6
(gdb) n
Single stepping until exit from function _nc_tracing,
which has no line number information.
0x4000c280 in _dl_init () from /lib/ld-linux.so.2
(gdb) n
Single stepping until exit from function _dl_init,
which has no line number information.
0x40011d5b in realloc () from /lib/ld-linux.so.2
(gdb) n
Single stepping until exit from function realloc,
which has no line number information.
0x4000c28e in _dl_init () from /lib/ld-linux.so.2

You will notice that I am messing around with pthreads, which appeared in the above backtrace anyway, and may be significant.

I noticed that _dl_init appears to call realloc, however, when tracing through. But perhaps not, since a disassembly reveals:

0x4000c285 <_dl_init+373>:      push   %ebx
0x4000c286 <_dl_init+374>:      sub    $0x4c,%esp
0x4000c289 <_dl_init+377>:      call   0x40011d5b <realloc+8587>
0x4000c28e <_dl_init+382>:      add    $0xa9f2,%ebx
0x4000c294 <_dl_init+388>:      mov    0xfffff864(%ebx),%edx

...realloc+8587 is a bit of a funny address to jump to.

The reason that intrigued me is that my code has overriden realloc and friends via the __malloc_initialize_hook, but the above call to realloc doesn't invoke my own realloc.

Why am I overriding malloc and friends? Because I'm taking memory management into my own hands, I'm implementing a FORTH-like system and that benefits from direct access to sbrk (). So malloc et al use my own heap.

Now, of course, the most obvious conclusion is that my dodgy pointer arithmetic is mangling something and thus laying a landmine that exit() treads on. Indeedy, I have found that the libncurses function 'initscr' is where the landmine appears to be laid; if I exit(1) before that call all is well, if I exit(1) just after that call then I get the segmentation violation.

Dropping debugging hooks in reveals that initscr calls a host of my memory allocation routines (via the malloc override hook). But I can't find a thing wrong there.

Even when I make malloc become 'advance the allocation pointer by X+4 bytes, write the length allocated into the first 4 bytes, and return the beginning of the rest of the region', free to be a no-op, and realloc to be 'return if new size is zero, otherwise allocate a block of the new size and (if the original pointer is not null) copy the original data there (using the original block length stored in the 4 bytes before the pointer by malloc)', it doesn't work, and I've checked that it's not calling memalign.

I suspect it's probably a problem in my code, since ncurses seems to work fine on its own with glibc, but there is a slim chance that the overriding of malloc isn't being handled correctly by glibc; just one direct reference to the 'real' malloc/realloc/whatever instead of to my code would corrupt the heap and send pointers flying everywhere.

So my main question is really "where do I look next?". I'm not familiar with the functions called within libc during shutdown, so I'm mystified as to where to look for the problem! If I can at least find out what memory location has been clobbered, then I might be able to convince gdb to interrupt execution on the first write to that location...

Thanks,

ABS





reply via email to

[Prev in Thread] Current Thread [Next in Thread]