libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] [Omp-tools] Sampling-based performance measurement


From: Bert Wesarg
Subject: Re: [Libunwind-devel] [Omp-tools] Sampling-based performance measurement of LLVM OpenMP runtime leads to deadlock!
Date: Thu, 17 May 2018 07:56:46 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

[ Added the Score-P devs, as they also implement a tool based on the OMPT too.
  Added the libunwind devs, because they know about the general problem of TLS 
variables and signals too. ]

Dear John,

On 05/12/2018 02:19 AM, John Mellor-Crummey wrote:
While using a sampling-based profiler (HPCToolkit) to measure the performance 
of an application using a dynamically-linked version of the LLVM OpenMP 
runtime, I encountered a deadlock on x86_64. Although I haven’t considered 
other architectures in detail, I believe that they may be similarly affected.

Here’s what I believe I have observed: there is a subtle race condition between 
TLS setup for an OpenMP runtime and and a profiler that inspects it through the 
OMPT interface.

A thread executing code in __kmp_launch_worker in the context of the LLVM 
OpenMP runtime library acquired the lock controlling access to TLS state (
__tls_get_addr callstls_get_addr_tail callspthread_mutex_lock)to set up TLS needed for its access to its thread local variable __kmp_gtid in frame 24 of the callstack shown below. Immediately after acquiring the TLS lock by setting its __lock field with a CMPXCHG but before recording the lock owner or finishing TLS setup, the thread was interrupted by our profiler. As a normal part of its operation to record a sample, our profiler uses the OMPT tools API to check if the thread is an OpenMP thread by inspecting the thread id being maintained by the OpenMP runtime. A call to a runtime entry point through the OMPT API led to an access to __kmp_gtid in frame 5 of the call stack. However, TLS has still not been set up for the OpenMP runtime shared library for this thread and causing the access to __kmp_gtid  to go through the same protocol as before (__tls_get_addr callstls_get_addr_tail callspthread_mutex_lock). However, the lock has already been acquired in frame 21 so it is unavailable for acquistion in frames 0-2, causing deadlock. The TLS lock is implemented as a recursive lock, but the profiler interrupted the lock acquisition in libpthread before the owner field of the recursive lock was set, so the inner call to pthread_mutex_lock can't succeed.

*This is a serious problem if a profiler using the OMPT interface can cause a 
deadlock. *

We need a design of the OMPT interface and OpenMP runtime implementations that 
make this impossible.

After thinking about this for a while, I think that a profiler can arrange to 
receive the ompt_callback_thread_begin and the profiler then set a thread local 
flag in its own TLS variables to note that a thread is an OpenMP thread. A 
profiler must not invoke any ompt runtime entry point on a thread that has not 
announced itself as an OpenMP thread by previously calling 
ompt_callback_thread_begin. An OpenMP runtime should ensure that its TLS is 
allocated before invoking the callback ompt_callback_thread_begin. Similarly, a 
profiler shouldn’t invoke an OMPT callback on a thread after receiving 
ompt_callback_thread_end.

If a profiler thread doesn’t use the OMPT interface to inspect a thread that 
hasn’t announced itself as an OpenMP thread, it won’t access any TLS state that 
the OpenMP library may maintain.

Does anyone care to comment or offer a vision of a different solution?

there actually is a very simple solution for this: declare the TLS variable with the 
model "initial-exec" [1]. This avoids the repeated calls to __tls_get_addr, 
which is expensive anyway and as this uses malloc, it is not async-signal safe either. 
Though it is also wise to touch any TLS variable before any signal can be triggered. 
Maybe OMPT can signal this, so that the OMPT user can setup interrupt sources after that 
was done.

Best,
Bert

[1] https://www.akkadia.org/drepper/tls.pdf


Below my signature block are some details of the thread state that I observed, 
in case you want to validate my assessment of the situation.


--
Dipl.-Inf. Bert Wesarg
wiss. Mitarbeiter

Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
01062 Dresden

📞 +49 (351) 463-42451
📠 +49 (351) 463-37773
📧 address@hidden

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]