[Libunwind-devel] UNW_CACHE_PER

Hello Everyone,

First of all, I would like to start off saying that we have been using libunwind in our production code for a while now and it is a tremendous improvement over ::backtrace that we historically used. Libunwind fixes segv and truncated stacktrace issues that came out of ::backtrace and is also significantly faster. So a huge thanks to you all for the great work!

Please help us resolve the following issue between libunwind and our own memory leaks tool. Our tool intercepts memory allocations and hash allocations’ artifacts using stacktrace as a key. Intercepting dlfcns specific allocations (coming from dlopen, dlsym, etc.) we hit a deadlock issue. Here is an example:

THREAD 1
dlopen (takes dl lock)
  malloc
    observe_malloc      - our malloc hook
      unwind_stack      - Linux libunwind implementation
        unw_step (takes libunwind CACHE lock)

THREAD 2
malloc
  observe_malloc      - our malloc hook
    unwind_stack      - Linux libunwind implementation
      unw_step (takes libunwind CACHE lock)
        dl_interate_hdr (takes dl lock)

The same deadlock may happen while THREAD 1 is in dlsym and other dl functions that all internally use the same recursive lock dl lock and allocate memory.

To get around this, we modified Linux libunwind mode UNW_CACHE_PER_THREAD to be thread-safe along the lines of IA64 implementation - made CACHE thread-local.

Please tell us if there is something wrong with this change and guide us in the right direction if so. Here is the diff:

---
libunwind/src/dwarf/Gparser.c

==== //3rdparty/tmw/libunwind/src/dwarf/Gparser.c#1 (text) - //3rdparty/tmw/libunwind/src/dwarf/Gparser.c#2 (text) ==== content
@@ -27,6 +27,12 @@
#include "dwarf_i.h"
#include "libunwind_i.h"

+// Thread-local DWARF cache (along the lines of IA64 implementation).
+// With the change to get_rs_cache() below, libunwind becomes
+// thread-safe when the caching policy is UNW_CACHE_PER_THREAD.
+// See, http://comments.gmane.org/gmane.comp.lib.unwind.devel/539
+static __thread struct dwarf_rs_cache dwarf_per_thread_cache;
+
#define alloc_reg_state() (mempool_alloc (&dwarf_reg_state_pool))
#define free_reg_state(rs) (mempool_free (&dwarf_reg_state_pool, rs))

@@ -524,7 +530,11 @@
   if (caching == UNW_CACHE_NONE)
     return NULL;

-  if (likely (caching == UNW_CACHE_GLOBAL))
+  if (likely (caching == UNW_CACHE_PER_THREAD))
+    {
+      cache = &dwarf_per_thread_cache;
+    }
+  else
     {
       Debug (16, "%s: acquiring lock\n", __FUNCTION__);
       lock_acquire (&cache->lock, *saved_maskp);

libunwind/tests/Gtest-concurrent.c

==== //3rdparty/tmw/libunwind/tests/Gtest-concurrent.c#1 (text) - //3rdparty/tmw/libunwind/tests/Gtest-concurrent.c#2 (text) ==== content
@@ -86,7 +86,16 @@
   int i;

   pthread_attr_init (&attr);
-  pthread_attr_setstacksize (&attr, PTHREAD_STACK_MIN + 64*1024);
+
+  // After the change to src/dwarf/Gparser.c the caching policy UNW_CACHE_PER_THREAD
+  // is thread-safe (along the lines of IA64 implementation). See,
+  // http://comments.gmane.org/gmane.comp.lib.unwind.devel/539
+  //
+  // We had to comment out the setstacksize line, otherwise the test fails
+  // after DWARF cache became thread-local. Thread-local cache adds ~64K to
+  // thread's stack size.
+  //
+  // pthread_attr_setstacksize (&attr, PTHREAD_STACK_MIN + 64*1024);

   for (i = 0; i < NTHREADS; ++i)
    if (pthread_create (th + i, &attr, worker, NULL))

The obvious drawback is that thread-local CACHE increases thread's stack size to ~64K. So, a thread with stack size restriction =< 64K will not start. While this may be an issue for generic libunwind, for us it is not.

Assuming that we are not completely off, we are inclined towards making PER_THREAD cache thread-safe as the name suggests.

Again, we understand that this would mean adding thread-local cost which may not be acceptable for general purposes. However, since this option is broken anyway, we can make it a compile time option (say, #ifdef-ed with PER___THREAD) with the default being UNW_CACHE_GLOBAL (as it currently is). This way, the clients that do not need PER___THREAD would incur no additional costs, and will not have the option UNW_CACHE_PER_THREAD. The clients who request PER___THREAD would get the working UNW_CACHE_PER_THREAD feature at the cost of extra thread-local storage.

If you find these enhancements reasonable, is there a possibility of these changes making into official release? If you do not agree with this reasoning, can you please share your opinion and guide us with the best way to move forward?

Also, we want to make sure that we pay stack size penalty _only_ when the memory leak tool is running and are inclined to build libunwind two ways:

- as a shared library, without PER___THREAD; link it to the product

- as a static library with PER___THREAD; link it to our leak tool.

Could you please advise us on the best way to build libunwind as a static library?

Thanks and Regards,

Prabhat

From:	Prabhat Verma
Subject:	[Libunwind-devel] UNW_CACHE_PER_THREAD
Date:	Tue, 8 Jan 2013 22:33:20 +0000

[Libunwind-devel] UNW_CACHE_PER_THREAD