crowdsourcing analysis and tuning of LLVM optimization heuristic?

Dear all,

We are pleased to announce a prototype framework for sharing compiler optimization knowledge across diverse hardware platforms, programs and datasets. We have started aggregating results for LLVM 3.6 .. 3.9 in our public repository here:

Many of you know that devising good compiler optimization heuristics is a always challenge. The compiler codebase, workloads and even targets change so rapidly that manual optimization tuning is not only unproductive - it is simply infeasible. This explains how it is often possible to find a combination of compiler flags that beats the default best (e.g. “-O3”) optimization level by a factor of 2 or more. Poor compiler optimization heuristics for a particular target directly affects users’ perception of the target’s performance (and hence its competitiveness).

That’s why we have developed a framework for crowdtuning compiler optimization heuristics. Here’s a bird’s eye view of how it works. (For more details, please see Home · mlcommons/ck Wiki · GitHub ): you install a client Android app ( The app sends system properties to a public server. The server compiles a random shared workload using some flag combinations that have been found to work well on similar machines, as well as some new random ones. The client executes the compiled workload several times to account for variability etc, and sends the results back to the server.

If a combination is found that improves performance over the combinations found so far, it gets reduced (by removing flags that do now affect the performance) and uploaded to a public repository. Importantly, if a combination significantly degrades performance for a particular workload, this gets recorded as well. This potentially points to a problem with optimization heuristics for a particular target, which may be worth investigating and improving.

At the moment, only global Clang flags are exposed for crowdtuning. Longer term, we are aiming to cover LLVM “opt” optimizations and fine-grain transformation decisions (vectorization, unrolling, etc).

It’s work in progress, so we would like to apologize in advance for possible glitches! We thank all the volunteers who have contributed so far but there are still many things to add or improve. Please get in touch if you are interested to know more or contribute!

Best regards,

By the way, some of you asked us about the workloads we are currently using.

Current workloads in the CK are just to test our collaborative
optimization prototype and are a bit outdated (open source programs,
kernels and datasets from our past projects).

However, our point is to make an open system where the community can
add various (realistic) workloads via GitHub with some meta information
in JSON format to be able to participate in collaborative benchmarking and tuning.

Such meta information exposes data sets used, command lines, input/output files, etc.
This helps add multiple data sets for a given benchmark or even reuse
already shared ones. Finally, this meta information makes it relatively
straightforward to apply predictive analytics to find correlations between
workloads and optimizations.

Our hope is to eventually make a large and diverse pool of public
workloads. In such case, users will be able to select representative
workloads for their own requirements (performance, code size, energy,
resource constraints, etc) and a target hardware.

Furthermore, since optimization spaces are huge and it is infeasible
to explore them by one user or even in one data center, our approach
allows all shared workloads to continuously participate in crowdtuning,
i.e. searching for good optimizations across diverse platforms while
continuously reporting "unexpected behavior"
(similar to traditional bug buildbots).