clang front-end does not seem to allow fine-grain control of optimization (i.e. beyond the
-Ox options). However, for some benchmarking I’d like to be able to do this.
I have tried decomposing the compilation process. Instead of the monolithic command:
clang -O1 try.c -o try.elf1
I am executing:
clang -emit-llvm -S try.c -o try.ll
opt -O1 try.ll -o try.bc
llc try.bc -o try.s
clang try.s -o try.elf2
But the result is not the same - for some vectorial code I wrote (Intel/AVX2) the result of the monolithic command (
try.elf1) is 10x faster than the output of the decomposed compilation (
try.elf2). The only way to recover the performance of the monolithic case is to add
-O1 to the
clang call that produces the
try.ll file (which kind of defeats my purpose). I have also tried listing the optimizations applied by the monolithic command (using option
-fsave-optimization-record) and adding all these options to the call to
opt, to no avail.
So my question is: how to expose the optimization pipeline in a way that allows reproducing what the monolithic command does, allowing enabling and disabling of individual passes such as
It is difficult to reproduce exactly what clang does with
opt. But there is one major missing thing from your invocation
clang -emit-llvm -S try.c -o try.ll : this will default to
-O0 and tag every single function in the module with
optnone which makes the optimizer basically ignoring these later when you try to run
-O1 (this attribute is important for LTO purpose for example where you may have one file built with
O0 mixed with other files that aren’t
O0 and during LTO you want things to behave as expected).
The way to get it to work is to add :
-O1 -Xclang -disable-llvm-passes to get the original
When you run
opt then you’ll see the optimizer run, but
clang sets up some Target libraries info a bit differently than what you can do with
opt unfortunately. It is likely good enough to experiment, but good to keep in mind.
Another thing missing from your invocation is that
llc also accepts a
-O argument which enables more optimizations in the backend itself.
Indeed, optimization passes in
opt are difficult to get a hold on. There are even differences between the pass names listed under
opt --help and the pass names accepted by the tool (I don’t understand how they managed to do this).
It’s too bad there’s no way to list passes that get executed. The
-debugify seem useful in the beginning, but the output is incomplete (for instance
-always-inline does not get listed).