Area for improvement

The second issue is that we need to do redundancy elimination and hoisting on operations that are only exposed once instruction selection is performed. Getelementptr expansion is just ONE particular case of this, but it can happen with any instructions (including the use of large integer (or any FP) constants on RISC machines, addressing globals with PIC code, handling extended/truncated operations (for RISC machines with a single integer size), etc. Note that hacks like "LowerMultiDimRefs" and the sparc Preselection pass sorta-kinda work around SOME of this,

Actually, they are capable of working around much of this, and most such optimizations are actually quite architecture-independent, e.g., LowerMultiDimRefs, which is a major optimization for array-intensive languages.

I obviously agree that high-quality codegen of getelementptr instructions is very important! I have two primary problems with LowerMultiDimRefs:

1. We don't want the code generator munching on the LLVM code as it
   generates code. The code generator should only read the LLVM code.
   Right now if we do thinks like a "quick JIT" followed by a "good code
   JIT" (when we find that a piece of code is hot), the second JIT gets
   different LLVM code than the first one had as input. We are still not
   quite to the point where this is the case, but we are getting there.
2. The way LowerMultiDimRefs works is by decimating GEP instructions into
   pieces, then relying on GCSE and LICM to clean them up. The problem
   with this is that GCSE and LICM are not the trivial passes that you'd
   like them to be for a simple cleanup job like this. In particular,
   GCSE requires a value numbering implementation, LICM requires
   loop info and alias analysis.

Finally, while the *code* for MultiDimRefs isn't target specific, the decisions it makes *are*.

Let's take these details offline but I'll just address this one point here -DecomposeMultiDimRefs actually makes virtually no decisions at all. It just decomposes into single-index operations to allow LICM to work. The selector makes the rest of the decisions about how to put GEPs back together to match the target.

I don't claim such choices are the right long-term solution but the right long-term solution is exactly what you're implying below - a new optimization framework within the back-end, working on it's own representation, etc. This has taken 4 years and is still under development. This is why what you call "hacks" are sometimes the right choice in the short term.

I can appreciate that the LowerMultiDimRefs pass made a *lot* of sense 4 years ago, but I'm concerned about moving forward. Now we have more of the infrastructure that is needed to move beyond it. Also, the new instruction selection framework is only a couple months old, not 4 years. :slight_smile:

What I meant was that it took 4 years after work on LLVM (and the sparc back end) first began. Anyway, it should solve these problems much more cleanly.

--Vikram

Chris Lattner wrote:

Also, some of what LSR needs to decide is architecture dependent. For example, it may not want to strength reduce a multiplication which multiplies by a small power of two, as this is handled by addressing modes on some architectures.

You're right. However, we can choose to expose information about target parameters through the Target* interfaces that llvm->llvm passes can use as well, so at least this aspect is not a killer issue.

-Chris

The only problem I have with this is that bytecode ought to be platform independent. If I compile on an X86 with complex addressing modes, then take the bytecode and translate it to machine code on some RISC, that's not fair to the RISC. Or vice versa. But then this might already be a problem with other optimizations so it might not really introduce anything new (does it?). Or delay all optimization until machine code generation time.

That's a good point. Currently, given a target-independent input, we only do target independent transformations on the code (and this breaks that premise somewhat). I think that you're right that we should eventually do this in the code generator, but as a concession to what is easiest in the short term and will have the biggest performance impact, I still think doing it at the LLVM level is the way to go.

-Chris

Actually, a cleaner model to have in mind is that LLVM-level optimizations happen in 2 stages (each can have multiple steps):
  (1) Ahead-of-time, which includes compile-time and link-time, and only performs target-independent optimizations. This ensures that code that is "shipped" is architecture-neutral.
  (2) Post-install time, which may be install-time, load-time, run-time or even between runs. This stage can perform target-dependent optimizations freely at the LLVM level because code will no longer be moved to a different machine. Code generation (and optimizations on the low-level IR) logically happen at the end of this stage.

Note that #2 can happen multiple times for the same program, e.g., once at install time, and then again after a run that gathers some profile data.

LLVM was designed to make this model possible. Using this model with LLVM is mainly a matter of setting up Makefiles, etc., so that the right passes happen at each step. (This is not what llvm-gcc or llvmc normally do, however.)

--Vikram