[LLVM] Improving compile times

Description of the project: While the primary job of a compiler is to produce fast code (good run-time performance), it is also important that optimization doesn’t take too much time (good compile-time performance). The goal of this project is to improve compile-time without hurting optimization quality.

The general approach to this project is:

  1. Pick a workload to optimize. For example, this could be a file from CTMark compiled in a certain build configuration (e.g. -O0 -g or -O3 -flto=thin).
  2. Collect profiling information. This could involve compiler options like -ftime-report or -ftime-trace for a high-level overview, as well as perf record or valgrind --tool=callgrind for a detailed profile.
  3. Identify places that are unexpectedly slow. This is heavily workload dependent.
  4. Try to optimize an identified hotspot, ideally without impacting generated code. The compile-time tracker can be used to quickly evaluate impact on CTMark.

As a disclaimer, it should be noted that outside of pathological cases, compilation doesn’t tend to have a convenient hotspot where 90% of the time is spent, instead it is spread out across many passes. As such, individual improvements also tend to have only small impact on overall compile-time. Expect to do 10 improvements of 0.2% each, rather than one improvement of 2%.

Expected results: Substantial improvements on some individual files (multiple percent), and a small improvement on overall geomean compile-time.

Desirable skills: Intermediate C++. Familiarity with profiling tools (especially if you are not on Linux, in which case I won’t be able to help).

Project size: Either medium or large.

Difficulty: Medium

Confirmed Mentor: @nikic

8 Likes

Hello,

I am quite interested in this project as I am very new to LLVM development (mainly read the Kaleidoscope tutorial and lurked on the discourse till now). Thus, I think this is a great opportunity for me to deep-dive into parts of LLVM and understand how things are laid out.

Do you think it will be a good idea for me to try and work on this project? I have enough experience with C++ to be able to read and modify the LLVM codebase. I have also profiled a fair amount of code with perf and understand how to read its output.

Thanks,
Dhruv

I think going into this without being particularly familiar with LLVM would be fine. It’s a good way to get into contact with different areas of the compiler (there are plenty of compile-time issues in all of the frontend, middle end and back end).

Even for people familiar with project, it’s a given that profiling will hit areas that one hasn’t even heard of before (LLVM is very large), and one has to be ready to jump into unfamiliar code and at least gain a cursory understand of what it’s trying to do. Of course, some general familiarity with LLVM and/or compiler construction does make it easier to do that.

Hello,

I’m a second-year undergraduate pursuing computer engineering. Programming in C/C++ and Compiler construction are among my interests. I’ve worked on projects to build transpilers and interpreters. I would like to contribute in this project. Could you please give me some resources to get started with?

Here are some notes on the necessary setup for profiling LLVM.

First, we will want to use a Release build, and it is sufficient to build only clang and the X86 backend (or AArch64 if that’s your host architecture). Additionally, we should specify -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer in CFLAGS to allow easy call-graph profiling with perf:

cmake -G Ninja -B build/ -H llvm/ \
    -DCMAKE_BUILD_TYPE=Release \
    -DLLVM_ENABLE_PROJECTS="clang" \
    -DLLVM_TARGETS_TO_BUILD="X86" \
    -DCMAKE_C_FLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" \
    -DCMAKE_CXX_FLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" \
    -DLLVM_CCACHE_BUILD=true \
    -DLLVM_USE_LINKER=lld
# Last two lines are just to reduce build time, otherwise optional.

ninja -C build

To use CTMark, we should download GitHub - llvm/llvm-test-suite and then configure as follows (for the -O3 configuration):

cmake -G Ninja -B build-O3 -H. \
    -DCMAKE_C_COMPILER=$PATH_TO/llvm-project/build/bin/clang \
    -DTEST_SUITE_SUBDIRS=CTMark \
    -DTEST_SUITE_RUN_BENCHMARKS=false \
    -C cmake/caches/O3.cmake

ninja -C build-O3 -v > out

This will build all files in CTMark. It’s possible to collect compile-time stats while doing so, but it’s not really possible to get high-accuracy compile-time statistics with vanilla CTMark. What we’re actually interested in here is manually profiling individual files. The above command collected all the build commands into a file, and we can now pick out one (e.g. for sqlite3.c) and run it through a profiler:

cd build-O3
perf record -g /home/npopov/repos/llvm-project/build/bin/clang -DNDEBUG  -O3   -w -Werror=date-time -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DSQLITE_OMIT_LOAD_EXTENSION=1 -DSQLITE_THREADSAFE=0 -I. -MD -MT MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o -MF MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o.d -o MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o -c /home/npopov/repos/llvm-test-suite/MultiSource/Applications/sqlite3/sqlite3.c
perf report

We can replace perf record -g with valgrind --tool=callgrind for a profiler via the valgrind emulator. This is much slower (and does not require frame pointers) but gives a more detailed profile.

Instead of using a profiler, we can also append -ftime-report to the command line, which will print an overview of pass execution timings to stderr.

Finally, we can use -ftime-trace. This will produce a .json file next to the object file, so it will be in some location like MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.json. This file can then be viewed by using about:tracing in Google Chrome for example.

Just worth mentioning that Intel VTune is also available on Linux and is an excellent resource for finding bottlenecks: Get Started with Intel® VTune™ Profiler for Linux* OS

1 Like

Hi, I’m a junior student with some HPC work experiences and I’ve wrote some LLVM middle-end optimization passes.
Surely, the long compile time of C/C++ (especially C++ templates) is a problem, but it only costs several seconds at most except some complicated templates like Eigen. I think the accurate profiling would be a greate challenge.

Hi @nikic, I’m interested in working on this project over the summer – I was curious if I could send you a draft of a proposal by the end of next week, and would appreciate it if I could ask a few specific questions before then (perhaps over another channel, such as the LLVM slack?) If you have any general resources that could be helpful in drafting my proposal/getting a better understanding of what this project entails, that would be great as well.

Hello mentors, @nikic , I’m Madhurjya, a computer science undergraduate passionate about innovative technology. I have extensive knowledge of C/C++ and various algorithms, and I enjoy developing new things. I’m particularly interested in this project because I’m currently studying Compiler Design, which has given me a basic theoretical understanding of how compilers work. Working on a real-world project like this would give me hands-on experience and deepen my understanding of the context. I’m excited about the opportunity to work with you and learn from your expertise.

Sure. LLVM doesn’t have a Slack, but there’s a Discord here: LLVM

I don’t have any specific advice for the proposal, but I can share a few more thoughts on the project.

Compile-time issues come on a scale from “average” to “pathological”. Average is just normal compilation without any particular bottlenecks. For example, a typical property of an average compilation is that 10% of the time is spent in InstCombine, so any optimizations to InstCombine tend to have measurable impact on overall compile-time. I would say optimizing these average cases is the primary goal of the project.

Pathological cases are on the other side of the spectrum: They run into some kind of substantially super-linear algorithm which makes code that should compile in milliseconds take minutes or hours, spending most time in a single place. And then there are cases in between where some pass takes more time than usual, but not pathologically so.

The “average” case can be a bit tricky to approach at first, because LLVM profiles are pretty flat, so it can be hard to identify a good candidate for optimization. For that reason, it may be worthwhile to start by looking at some pathological cases (though I don’t think this should be the bulk of the project). A few places where these are collected:

Finally, it may be interesting to look at some of the recent changes which had measurable positive or negative (average-case) compile-time impact, which you can find here: LLVM Compile-Time Tracker

Hi @nikic,

Thank you so much for your response! I think I’m going to work on this project independently of GSOC, I’ll let you know if/when further questions arise and will update you once I make significant progress.

Thank you so much,
Tej