Definitive list of optimisations at each optimisation level

I am often asked what optimisations “our” compiler performs at each level. But “our” compiler is actually CLang/LLVM which we have retargeted to our proprietary target.

Most of the work we do is in maintaining our target specific backend. Certainly there are optimisations that we do to take best advantage of our instruction set during lowering and instruction selection, and we have also added a couple of additional passes which manipulate the IR in advance of lowering to better shape it for our target.

But in practice the vast majority of optimisations are contributed by the continuously evolving and excellent LLVM target independent passes, and have little or nothing to do with the work we do in our backend - though it is obviously directed/tuned by the various target call-backs and the target cost models.

Is there a “one stop shop” list of the optimisation passes that LLVM performs, and identification of which are enabled by default for each of the 4 standard optimisation levels ‘-O0’ thru ‘-O3’?

The reason that I ask, is that I really don’t have an honest or informed answer that I can provide to people when they ask, and I haven’t found a definitive statement of this in the LLVM documentation that I could refer them to.

Thanks,

MartinO

Have u looked in the pass manager?

The most definitive list you can probably hope to get will be obtained by passing -mllvm -debug-pass=Structure to a clang invocation.

– Sean Silva

Right but then you’ll have to call each opt level. Have u looked in the pass manager?

Looking at PassManagerBuilder can be useful because there are sometimes comments giving some idea of the intent of the particular choice of passes, but it can be difficult to see the big picture of all the passes that are run because it is very parameterized and split across multiple subroutines.

Running clang at O0 through O3 with -mllvm -debug-pass=Structure is generally more enlightening because then you see everything in one place for each optimization level.

– Sean Silva

I agree, it’s much clearer, it just takes runs at multiple opt levels and therefore I don’t find it to be a “one stop shop”.

Thanks Sean and Silva.

I guess what I was seeking was a URL that I could point (non-compiler) people at, but I guess no such reference exists. What I can do if reference bot the source manager and use ‘-mllvm -debug-pass=Structure’ for each optimisation level, and document that. The downside is that this will continuously go out of data with each release.

All the best,

MartinO

PS: Movidius is now part of Intel, so I will be gradually switching to my Intel email address.

Thanks Sean and Silva.

I guess what I was seeking was a URL that I could point (non-compiler)
people at, but I guess no such reference exists. What I can do if
reference bot the source manager and use ‘-mllvm -debug-pass=Structure’
for each optimisation level, and document that. The downside is that this
will continuously go out of data with each release.

Yeah, the nasty thing is that as each user of LLVM is running potentially
different pass pipelines, we don't really have a good way to write a
user-facing (or at least non-compiler-dev-facing) page describing the
optimizations that they should expect the compiler to do, as the compiler
developers using LLVM may have changed it. In theory, we could provide such
a page for the open-source clang. Such a page of course couldn't go into
massive detail about each pass and as such will likely be mostly describing
the basics.

Most of the information that users would care about is likely to not get
out of date very easily (e.g. we might switch GVN implementations, but the
way in which they differ isn't going to be relevant to such a document; we
might switch EarlyCSE to MemorySSA, but that isn't relevant to such a
document).

-- Sean Silva

Thanks Sean,

My apologies for previously writing “Thanks Sean and Silva”, I intended “Thanks Sean and Ryan” - duh! - sleep deprived having just flown to the USA.

The dump is very enlightening for me, and very large - I hadn’t realised that there were quite so many passes. At ‘-O0’ there are ~70 passes, but at ‘-O1’ it jumps to ~310 passes, and then ~340 passes at ‘-O2’ and ‘-O3’. I will have to carefully study these passes to see what optimisations they perform (if any), though there are the obvious ones too.

One thing I observed, is that there are some passes that report:

Unnamed pass: implement Pass::getPassName()

This always appears after the final Target ‘AsmPrinter’ pass (“SHAVE Assembly Printer” in our case) and appears to be from the ‘FPPassManager’, while the other happens after ‘Global Variable Optimizer’ for optimisation levels of ‘-O1’ and higher and this seems to be the from the ‘MPPassManager’. I am using the release LLVM v4.0 sources, not the head revision. I guess these are noise and can be ignored.

All the best,

MartinO

Thanks Sean,

My apologies for previously writing “Thanks Sean and Silva”, I intended
“Thanks Sean and Ryan” - duh! - sleep deprived having just flown to the USA.

The dump is very enlightening for me, and very large - I hadn’t realised
that there were quite so many passes. At ‘-O0’ there are ~70 passes, but
at ‘-O1’ it jumps to ~310 passes, and then ~340 passes at ‘-O2’ and ‘-O3’.
I will have to carefully study these passes to see what optimisations they
perform (if any), though there are the obvious ones too.

One thing I observed, is that there are some passes that report:

Unnamed pass: implement Pass::getPassName()

This always appears after the final Target ‘AsmPrinter’ pass (“SHAVE
Assembly Printer” in our case) and appears to be from the ‘FPPassManager’,
while the other happens after ‘Global Variable Optimizer’ for optimisation
levels of ‘-O1’ and higher and this seems to be the from the ‘
MPPassManager’. I am using the release LLVM v4.0 sources, not the head
revision. I guess these are noise and can be ignored.

I remember running into these before. I forget exactly what we ended up
discovering they were but I remember that it wasn't remarkable. Vaguely
remembering, I think the GlobalOpt one was something related to it
accessing function AA results or something like that. The reason it didn't
have a name is that it goes through some sort of adapter thing which
doesn't implement getPassName.

-- Sean Silva

I've just had to do the same as I'm also looking at the optimisations that Clang/LLVM is performing on C code.

Given some of the names of latter passes seemed to include my target name (x86), even at -O0, I wondered if some of these optimisations were target specific.

Thus, I ran `clang -mllvm -debug-pass=Structure -S -emit-llvm -xc /dev/null --target='le64-unknown-unknown-unknown' -O$OPT_LEVEL` which seemed to give me a shorter list with fewer to no target/machine-specific optimisations.

This gives ~21 passes at -O0, ~202 at -O1, ~225 for -Os and -Oz, and ~230 for -O2, -O3, -Ofast (counting lines of output)

Assuming a "Pass Arguments" line specifies a new pass hierarchy inside LLVM, I'm still unsure as to why there are 4 hierarchies in for example the output of dumping the optimisations at -O1 (gist here: https://gist.github.com/anonymous/e71193ad5200bcc87b226374b7924365 )

Sam