Effectiveness of llvm optimisation passes

Hi all,

I am trying to understand the effectiveness of various llvm optimisations when a language targets llvm (or C) as its backend.

The following is my approach (please correct me if I did anything wrong):

I am trying to explicitly control the optimisations passes in llvm. I disable optimisation in clang, but instead emit unoptimized llvm IR, and use opt to optimise that. These are what I do:

* clang -O0 -S -mllvm -disable-llvm-optzns -emit-llvm -momit-leaf-frame-pointer a.c -o a.ll
* opt -(PASSES) a.ll -o a.bc
* llc a.bc -filetype=obj -o a.o

To evaluate the effectiveness of optimisation passes, I started with an 'add-one-in' approach. The baseline is no optimisations passes, and I iterate through all the O1 passes and explicitly allow one pass for each run. I didnt try understand those passes so it is a black box test. This will show how effective each single optimisation is (ignore correlation of passes). This can be iterative, e.g. identify the most effecitve pass, and always enable it, and then 'add-one-in' for the rest passes. I also plan to take a 'leave-one-out' approach as well, in which the baseline is all optimisations enabled, and one pass will be disabled at a time.

Here is the result for the 'add-one-in' approach on some micro benchmarks:

https://drive.google.com/drive/folders/0B9EKhGby1cv9YktaS3NxUVg2Zk0

The result seems a bit surprising. A few passes, such as licm, sroa, instcombine and mem2reg, seem to deliver a very close performance as O1 (which includes all the passes). Figure 7 is an example. If my methodology is correct, then my guess is those optimisations may require some common internal passes, which actually deliver most of the improvements. I am wondering if this is true.

Any suggestion or critiques are welcome.

Thanks,
Yi

Have -O0 on your clang command line causes all functions to get marked with an ‘optnone’ attribute that prevents opt from being able to optimize them later. You should also add “-Xclang -disable-O0-optnone” to your command line.

Craig was faster on the optnone flag (if you are using Clang 5 and above).
However, I observed that some of the opt passes ignore the optnone in
some cases, e.g., -breack-crit-edge.
You can use the -stats flag from opt to get a list of statistics what a
particular pass did (if it collects statistics of course).

Thank you very much. That explains the results.

I am running the benchmarks again with '-Xclang -disable-O0-optnone'.

Thanks,
Yi

I noticed that there is a '-run-pass' argument for llc. I am wondering if I can do a similar approach with machine level optimisations/passes for llc. Are those passes optional (so I can turn them off)? And how can I get MIR format as llc expects with '-run-pass'?

Thanks a lot.

Cheers,
Yi

I noticed that there is a '-run-pass' argument for llc. I am wondering if I can do a similar approach with machine level optimisations/passes for llc. Are those passes optional (so I can turn them off)? And how can I get MIR format as llc expects with '-run-pass'?

It depends on the pass, some are optional, some aren't; if the pass has `if (skipFunction()) return false;` in the code then it is an optional pass that gets skipped in -O0.

In theory you should be able to do llc -stop-before, -run-pass, -start-after and write the intermediate results to .mir files. In practice we are not there yet. Targets have a big amount of state scattered around. The .mir files capture a lot it but not all, so it is likely that things don't work if you serialize to .mir in between.

And for the record: Despite the problems, the .mir files are an invaluable tool to write tests that test a single machine pass independently.

- Matthias

I feel I am still doing something wrong, as the performance do not seem to change with different passes I use.

The commandline I am using are:

* clang -O0 -Xclang -disable-O0-optnone -S -mllvm -disable-llvm-optzns -emit-llvm -momit-leaf-frame-pointer a.c -o a.ll
* opt -(PASS_FLAG) a.ll -o a.bc
* llc a.bc -filetype=obj -o a.o

I tried with PASS_FLAG as all passes from O1, a specific pass in O1, or directly use '-O1', '-O0'. The performance variation seems to be noise only (+/- 1%).

And clang is warning me about unused arguments for '-Xclang -disable-O0-optnone', though the result is different from not using the argument. I am using clang-5.0

Any help would be appreciated.

Thanks,
Yi