Access to fine-grain optimization control


The clang front-end does not seem to allow fine-grain control of optimization (i.e. beyond the -Ox options). However, for some benchmarking I’d like to be able to do this.

I have tried decomposing the compilation process. Instead of the monolithic command:

clang -O1 try.c -o try.elf1

I am executing:

clang -emit-llvm -S try.c -o try.ll
opt -O1 try.ll -o try.bc 
llc try.bc -o try.s
clang try.s -o try.elf2

But the result is not the same - for some vectorial code I wrote (Intel/AVX2) the result of the monolithic command (try.elf1) is 10x faster than the output of the decomposed compilation (try.elf2). The only way to recover the performance of the monolithic case is to add -O1 to the clang call that produces the try.ll file (which kind of defeats my purpose). I have also tried listing the optimizations applied by the monolithic command (using option -fsave-optimization-record) and adding all these options to the call to opt, to no avail.

So my question is: how to expose the optimization pipeline in a way that allows reproducing what the monolithic command does, allowing enabling and disabling of individual passes such as licm, inline, or loop-vectorize?


It is difficult to reproduce exactly what clang does with opt. But there is one major missing thing from your invocation clang -emit-llvm -S try.c -o try.ll : this will default to -O0 and tag every single function in the module with optnone which makes the optimizer basically ignoring these later when you try to run -O1 (this attribute is important for LTO purpose for example where you may have one file built with O0 mixed with other files that aren’t O0 and during LTO you want things to behave as expected).

The way to get it to work is to add : -O1 -Xclang -disable-llvm-passes to get the original .ll file.

When you run opt then you’ll see the optimizer run, but clang sets up some Target libraries info a bit differently than what you can do with opt unfortunately. It is likely good enough to experiment, but good to keep in mind.

Another thing missing from your invocation is that llc also accepts a -O argument which enables more optimizations in the backend itself.

1 Like

Indeed, optimization passes in opt are difficult to get a hold on. There are even differences between the pass names listed under opt --help and the pass names accepted by the tool (I don’t understand how they managed to do this).

It’s too bad there’s no way to list passes that get executed. The -debugify seem useful in the beginning, but the output is incomplete (for instance -always-inline does not get listed).