libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] Re: [patch 2/2] Allow caller to block signals.


From: Paul Pluzhnikov
Subject: Re: [Libunwind-devel] Re: [patch 2/2] Allow caller to block signals.
Date: Fri, 25 Sep 2009 17:28:16 -0700

On Fri, Sep 25, 2009 at 2:04 PM, Arun Sharma <address@hidden> wrote:

>>  sigprocmask (SIG_SETMASK, &unwi_full_mask, &saved_mask);
>>  ret = dl_iterate_phdr (check_callback, as);
>>  sigprocmask (SIG_SETMASK, &saved_mask, NULL);
>>
>> so (AFAICT) they wouldn't unblock anything that was blocked before.
>
> This is a well known deadlock that makes libunwind unusable for many
> users (who are not brave enough to implement their own dl_iterate_phdr
> that is async signal safe). The sigprocmask there isn't really fixing
> the problem.
>
> The case where libunwind gets a deadlock today is:
>
> * We're executing dynamic linker code, holding a lock
> * We get an async signal
> * The signal handler calls libunwind
> * libunwind blocks signals, but its too late
> * dl_iterate_phdr tries to grab a lock that this thread is already holding.
>
> This leads to a deadlock.

This is one of several instances of this deadlock.
The more common (for us) instance has noting to do with signals at all:

  T1                              T2
   calls dlopen                   calls malloc; takes malloc lock
    takes loader lock              calls unwind to record allocation trace
      calls calloc                  calls dl_iterate_phdr
       waits for malloc lock         tries to get loader lock

> Which brings up the question of why not drop this sigprocmask
> completely since it's not helping?

I believe the signal *is* necessary when (default) --enable-block-signals=yes,
for the same reason it is necessary elsewhere -- without it you can get
interrupted in pthread_mutex_lock while the lock is in inconsistent state,
and then crash trying to re-acquire it.

> How about a different approach to the problem? Abort the unwind in the
> unlikely event that we were holding the dl lock? Not sure if glibc
> exposes dl_load_lock.

You wish :-(

The loader lock is not exposed: here are relevant parts of glibc-2.3.6:

/// sysdeps/generic/lsodefs.h

struct rtld_global
{
#endif
  /* Don't change the order of the following elements.  'dl_loaded'
     must remain the first element.  Forever.  */

/* Non-shared code has no support for multiple namespaces.  */
#ifdef SHARED
# define DL_NNS 16
#else
# define DL_NNS 1
#endif
  EXTERN struct link_namespaces
  {
    /* And a pointer to the map for the main map.  */
    struct link_map *_ns_loaded;
    /* Number of object in the _dl_loaded list.  */
    unsigned int _ns_nloaded;
    /* Array representing global scope.  */
    struct r_scope_elem *_ns_global_scope[2];
    /* Direct pointer to the searchlist of the main object.  */
    struct r_scope_elem *_ns_main_searchlist;
    /* This is zero at program start to signal that the global scope map is
       allocated by rtld.  Later it keeps the size of the map.  It might be
       reset if in _dl_close if the last global object is removed.  */
    size_t _ns_global_scope_alloc;
  } _dl_ns[DL_NNS];

  /* During the program run we must not modify the global data of
     loaded shared object simultanously in two threads.  Therefore we
     protect `_dl_open' and `_dl_close' in dl-close.c.

     This must be a recursive lock since the initializer function of
     the loaded object might as well require a call to this function.
     At this time it is not anymore a problem to modify the tables.  */
  __rtld_lock_define_recursive (EXTERN, _dl_load_lock)

  /* Incremented whenever something may have been added to dl_loaded.  */
  EXTERN unsigned long long _dl_load_adds;
  ...

/// elf/rtld.c:

/* This is the structure which defines all variables global to ld.so
   (except those which cannot be added for some reason).  */
struct rtld_global _rtld_global =
  {
    /* Default presumption without further information is executable stack.  */
    ._dl_stack_flags = PF_R|PF_W|PF_X,
#ifdef _LIBC_REENTRANT
    ._dl_load_lock = _RTLD_LOCK_RECURSIVE_INITIALIZER,
#endif
    ._dl_position_hash_cutoff = DL_POSITION_HASH_CUTOFF_DEFAULT,
    ._dl_position_hash_bits = DL_POSITION_HASH_BITS_DEFAULT
  };
/* If we would use strong_alias here the compiler would see a
   non-hidden definition.  This would undo the effect of the previous
   declaration.  So spell out was strong_alias does plus add the
   visibility attribute.  */
extern struct rtld_global _rtld_local
    __attribute__ ((alias ("_rtld_global"), visibility ("hidden")));


So, the only way to get that lock is to lookup (exported) _rtld_global in
ld-linux, and then reach into it and grab the lock. Static libc.a has
different offset. Different compilations of glibc could have different
offset to load_lock.

Overall, I believe this is just not something we can reliably do.

On Fri, Sep 25, 2009 at 2:32 PM, Arun Sharma <address@hidden> wrote:

> Paul, I applied 4 of your patches, except for
> libunwind-sigprocmask-20090925-2.txt.

I think the last one should go in as well, as I explained above:
it is needed when --enable-block-signals=yes, and not needed when
--enable-block-signals=no, so exactly the same as SIGACTION macro.

Thanks,
-- 
Paul Pluzhnikov




reply via email to

[Prev in Thread] Current Thread [Next in Thread]