We use LLVM libraries to compile C++ code and noticed slow downs when multiple threads of a process were compiling at once. perf indicated that most of the CPU time was spent in a spin lock, which was being locked/unlocked from llvm::PassRegistry::getPassInfo().
We read the relevant LLVM code and found out that PassRegistry is a ManagedStatic and is shared among all threads in case of a multi-threaded setup. This sharing requires locking in PassRegistry’s method, which becomes source of the contention. To get rid of the contention, we made a change to make PassRegistry thread-local and got rid of the locking. This removed all the contention and we noticed a 2x speed up in single thread compiles and 7x improvement when ten threads were compiling in parallel.
Please find attached the diff for this change. We are using a old version of LLVM code (svn revision 170375), so the code might be quite outdated.
We have two questions:
- Does the change look reasonable? Or are we missing something here?
- When we run with 1000 threads compiling concurrently, we deterministically run into a segfault in PassRegistry lookup. Any insights into the segfault?
Please find attached the following files:
- pass_registry.txt: Git diff of our change. Note that it is against LLVM svn revision 170375.
- contention.txt: Perf report with existing LLVM code - shows contention in llvm::PassRegistry::getPassInfo()
- no_contention.txt: Perf report of LLVM built with our change.
- segfault.txt: Segfault we are encountering after our change.
- clang_compile.cpp: Snippet of code we use to compile code using LLVM.
Thanks a lot,
contention.txt (1011 Bytes)
no_contention.txt (1.58 KB)
pass_registry.txt (5.91 KB)
segfault.txt (1.11 KB)
clang_compiler.cpp (3.47 KB)