Removing contention in PassRegistry accesses to speed up compiles

Hi,

We use LLVM libraries to compile C++ code and noticed slow downs when multiple threads of a process were compiling at once. perf indicated that most of the CPU time was spent in a spin lock, which was being locked/unlocked from llvm::PassRegistry::getPassInfo().

We read the relevant LLVM code and found out that PassRegistry is a ManagedStatic and is shared among all threads in case of a multi-threaded setup. This sharing requires locking in PassRegistry’s method, which becomes source of the contention. To get rid of the contention, we made a change to make PassRegistry thread-local and got rid of the locking. This removed all the contention and we noticed a 2x speed up in single thread compiles and 7x improvement when ten threads were compiling in parallel.

Please find attached the diff for this change. We are using a old version of LLVM code (svn revision 170375), so the code might be quite outdated.

We have two questions:

  1. Does the change look reasonable? Or are we missing something here?
  2. When we run with 1000 threads compiling concurrently, we deterministically run into a segfault in PassRegistry lookup. Any insights into the segfault?

Please find attached the following files:

  1. pass_registry.txt: Git diff of our change. Note that it is against LLVM svn revision 170375.
  2. contention.txt: Perf report with existing LLVM code - shows contention in llvm::PassRegistry::getPassInfo()
  3. no_contention.txt: Perf report of LLVM built with our change.
  4. segfault.txt: Segfault we are encountering after our change.
  5. clang_compile.cpp: Snippet of code we use to compile code using LLVM.

Thanks a lot,
Nipun

contention.txt (1011 Bytes)

no_contention.txt (1.58 KB)

pass_registry.txt (5.91 KB)

segfault.txt (1.11 KB)

clang_compiler.cpp (3.47 KB)

We have root caused the segfault - it was due to a caching layer we have in our code, which is to avoid duplicate compilations. Basically, llvm::JIT::getPointerToFunction() looks up PassRegistry, but as our change introduces a separate PassRegistry for each thread, this means that the thread that calls llvm::JIT::getPointerToFunction() should have appropriate PassRegistry setup. In our setup, some threads were witnessing a cache hit on a code that was compiled by another thread, but when such a thread called llvm::JIT::getPointerToFunction(), it was getting a segfault as its PassRegistry was not setup.

Any comments on the our change to the PassRegistry?

Thanks,
Nipun

Hi Nipun,

The usual way to get feedback on patches is to upload them on
phabricator ( http://llvm.org/docs/Phabricator.html ) and send them to
llvm-commits. If you do the first part (upload patch to phabricator)
correctly, phabricator will mail your patch to llvm-commits with an
appropriate description automatically.

-- Sanjoy