Motivation : LLVM has lots of thresholds and flags to control the compiler behaviour. However, it is unclear if all these thresholds are useful, their value is reasonable, and what impact they really have. This work explores these thresholds, when they are hit, what it means if they are hit, how we should select their values, and if we need different profiles (Profiles here refer to metrics like compile time, size of the generated program, or any statistic that LLVM emits like “Number of stack-move optimizations performed” etc.)
Initial Results : We have conducted a large study on 10,000 *bitcode files and a small study on *100 bitcode files. Both of them reflect the fact that changing these knob values can positively affect various statistical values emitted by LLVM. We measure both compile-time and bitcode-size along with various stats emitted and present them in the form of interactive graphs here. For instance, here are the graphs for jump-threading-threshold :
Here we can observe that modifying the knob value leads to considerable increase in several stat values with varying compile-time and code size changes. Apart from that we also observe some very interesting patterns in our small study which shows values for a knob which not only increases the number of stats emitted but also gives a performance boost . For example, here are the graphs for inline-memaccess-cost :
We observe that as we increase the knob values, not only does most of the stats increase (with one increasing by 720%), but the performance parameters namely compile-time and bitcode-size also show a general decreasing trend.
Approach : We first collected all the knobs we could from the LLVM codebase by creating a custom clang matcher. We looked for the following patterns :
- Const knob_name = knob_val
- Cl::init
- Enum {knob_name = knob_val}
This landed us with nearly 500 knobs. The small study was meant to filter out the uninteresting knobs among these. Though this approach has its own flaws since a particular knob can emit stats for different *bitcode files, this seemed the best alternative with the limited resources available. We cherry picked 93 of these knobs which showed changes in at least one stat value. We then use a custom python tool (optimised to deal with I/O and cache bottlenecks) to collect the different stat values in parallel and store them in a json file which is later used to render the graphs. For compile time, we measure the instruction count using perf and for bitcode-size, we use llvm-size to get the text segment size in bytes.
Feedback and Next Steps: We would like to receive feedback on the current results and suggestions for next steps. So far we have planned to first determine whether multiple magic values exist for a particular knob. This involves studying the knobs in isolation and determining for what portion of the *programs a certain knob value performs well and for what portion of the program some other magic value performs well. Note that it’s not necessary for a knob to have multiple such values. We would first like to determine whether this exists for some knobs because then we can work towards making a framework that LLVM can potentially use to change its knob values according to the program it compiles.
cc : @jdoerfert @wsmoses @jhueckelheim
*The first K files in the Compile dataset