This RFC is part of our loop vectorization work that we will present in the LLVM developer’s conference next week and is about enabling loop-interchange by default. Interchange enables more vectorization and is one of the root causes why we are significantly behind compared to other compilers in some test cases.
This is about more than just enabling interchange though: DependenceAnalysis (DA) is used for legality checks. It is also by other loop optimisations but none of the loop optimisations are enabled by default. So, by enabling interchange, we will have a first user of DA. @sebpop, @madhur13490 and I are working on this and our concerted effort is about trying to get a foot in the door with loop optimisations in LLVM. Or perhaps in other words, we would like to test and answer the question whether LLVM IR is suitable to perform any loop optimizations (a bit more on this later).
Of all (disabled) loop optimizations, we think interchange is a good candidate to enable because i) it’s very general and not only beneficial for vectorization but also for improved memory access behaviour, ii) it triggers a quite a lot in the LLVM test-suite and other benchmarks including SPEC, and iii) it’s a relatively straightforward loop transformation to start with.
Our approach to get interchange and DependenceAnalysis enabled:
- We are aware of the bugs raised against both components, and we have started
fixing them. - We will collect compile-time numbers (a bit more on this below),
- And we intend to support both components and fix any issues that may come up
later, so this isn’t supposed to be a fix-and-forget exercise, also because
we might want to follow up with other loop optimizations.
Sebastian could elaborate more on this, but briefly about Dependence analysis: to fix this correctness issue, he is going to use and will add Memory SSA to DA. This then needs preserving in other loop optimisation passes, so there will be a little bit of churn for other loop optimisations related to MemorySSA, just as a heads up that these changes are in the pipeline. The question whether LLVM IR is the right abstraction for loop optimisations is related to recovering (multi-dimensional) array accesses, i.e. some type information is lost in LLVM IR, which is what delinearisation is trying to recover. While there are certainly limitations what delinearisation can achieve, we think it’s robust enough for our initial use cases. A discussion for later that we are open to, is to see how could add or recover this type information in a more robust way.
Compile-times will be interesting and important. A quick exercise with the compile time tracker shows a 1.8% and 1.12% increase for lencod and mafft respectively (see top results here) It looks like a bit more time is spent in BasicAA. Interchange isn’t even triggering, so I don’t know why that would be the case, but I am investigating this.
We are happy to receive any feedback and ideas, and are also happy to talk more about this on llvm dev conference next week.