Confused about optimization pass order

Hello,
I’m wondering how exactly LLVM deals with passes that open up opportunities for other passes. For example, InstCombine says that it opens many opportunities for dead code/store elimination. However, the reverse may also be true. How does LLVM currently handle this?

Hi Karl,

here is my, slightly oversimplified, take on this, I hope it helps.

We have a fixed, manually curated pipeline which seems to perform reasonably well (see for example llvm/lib/Transforms/IPO/PassManagerBuilder.cpp).
There are (call graph SCC) passes that run as part of this pipeline potentially multiple times, but still in the fixed order (as far as I know).

EJ (cc'ed) and I are going to propose a GSoC project to "learn" the interplay between sets of passes, e.g., what has to go together and in which order, and, potentially, alternative pipelines we could offer to people.
There are various details that are not totally clear yet but based on existing research it seems there are nice improvements to be expected if we find a way to manage the infrastructure challenges that come with such an effort.

Cheers,
  Johannes

Interesting. I was wondering whether it would be a good idea to separate passes into further separate groups, depending on what they do. For example:

  • Code reduction/elimination (ex. DCE, quite a few of the loop passes)

  • Instruction substitution (ex. vectorization passes/instcombine)

The problem I see with the current approach is that I think the code reduction passes get the short end of the stick; they can only run as many times as they’re added to the PassManager, meaning for larger projects something could be missed.

What I’d propose (take this with a grain of salt obviously) is some sort of implementation where the code reduction passes are all continually run until they are done. After that, do the same thing but with the substitution passes. I don’t know if there are any specific passes that just make code prettier for substitution, but they would (hypothetically) be run once in between. This is probably not the best way to go about this, but I think it could help.

Thanks,
Karl

Interesting. I was wondering whether it would be a good idea to separate
passes into further separate groups, depending on what they do. For example:
- Code reduction/elimination (ex. DCE, quite a few of the loop passes)
- Instruction substitution (ex. vectorization passes/instcombine)

So most passes we have do code reduction and/or code canonicalization.
That is what we are doing if we are not vectorizing at least :wink:

The problem I see with the current approach is that I think the code
reduction passes get the short end of the stick; they can only run as many
times as they're added to the PassManager, meaning for larger projects
something could be missed.

True, that is always the case. Compiler phase ordering is hard and there
is no "correct" answer. It all depends on what your sweet spot is
between compile time, size, and performance (not to mention the kinds of
programs you are primarily interested in).

Do you have some particular situation in mind?

You might also wanna take a look at existing passes that iterate till a
fixpoint is reached [(IP)SCCP and the Attributor^ [0,1] come to mind.
Both performs some code elimination and there are outstanding patches
for the Attributor to improve on that further [2,3].]

[0] https://llvm.org/devmtg/2019-10/talk-abstracts.html#tech24
[1] https://llvm.org/devmtg/2019-10/talk-abstracts.html#tut6
[2] ⚙ D73313 [Attributor] Use fine-grained liveness in all helpers
[3] ⚙ D68934 [Attributor] Make value simplify stronger

^ Run -O3 -mllvm -attributor-disabled=false

What I'd propose (take this with a grain of salt obviously) is some sort of
implementation where the code reduction passes are all continually run
until they are done. After that, do the same thing but with the
substitution passes. I don't know if there are any specific passes that
just make code prettier for substitution, but they would (hypothetically)
be run once in between. This is probably not the best way to go about this,
but I think it could help.

I think if we have an automated way to construct and test pipelines
properly we can try this out. I doubt it will be too successful so I
would recommend against investing time in manually trying it. The reason
I doubt this works well is that there is a cyclic dependence across the
different passes. One of the most important ones is:

  1) Try hard to remove code so inlining is not too costly.
  2) Inline functions to enable further (local) transformations.
  3) Go to step 1).

[ Take these only as words of caution. I don't want to discourage you to
  try new things and I'd be happy if you achieve good results either way!]

Cheers,
  Johannes