[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] libunwind x86-64 optimisations?

From: Lassi Tuura
Subject: Re: [Libunwind-devel] libunwind x86-64 optimisations?
Date: Mon, 6 Jul 2009 16:06:55 +0200


It's a shame that GLIBC calls its internal _dl_debug_state directly,
and not through _r_debug.r_brk; if it did, you could intercept library
events and maintain a private copy of the list.  You can still read
the list on your own via _r_debug.r_map, although it may be in an
inconsistent state.  All that gets you, though, is the public fields
of the link map.  I don't remember for sure, but I believe libunwind
does need the program headers - it would have to cache a copy, and
validate them using l_addr/l_name/l_ld.

Ah thanks, yes. As our profiler rewrites (some) machine code on the fly, it looks like I could vector ourselves into _dl_debug_state() dynamically at run time, capture a copy of the information and find a way to teach libunwind to use the private data instead.

2) Try to determine which frames are "varying", i.e. uses VLAs or
alloca(). [...]

This information is not available, so you need a different approach.
Since in practice most x86-64 frames use either rsp or rbp for their
CFA, you could cache which it was and where, if anywhere, rbp was
saved.  That should be pretty quick.  If there are any data-dependent
instructions in the CFI to be concerned about you could also
maintain a "safe" flag in the unwinder, but I think the only one
is DW_CFA_val_expression (maybe DW_CFA_GNU_window_save but you won't
see that on x86-64).

I did run into functions that had non trivial CFA location; it wasn't a straight register value but some sort of an expression. I'll do more research to see what exactly happens in functions using alloca () / vla, and what the DWARF info looks like.

Thanks for the ideas! If anyone else has suggestions on further optimisation opportunities, would love to hear them :-)

(Pointers to alternative high-quality profiling tools are equally welcome. We're not married to ours, it's just we've never found anything else that a) requires no code instrumentation, b) runs at decent performance, c) sufficiently featureful, and d) can deal with our software.)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]