Removing the separation between opt and codegen?

Hello,

One important next step in turning LLVM into a first-class
autovectorizing compiler will be to incorporate target information into
the vectorization logic. To really make good decisions regarding what
is profitable to vectorize, and how that vectorization should be done,
it will be important for the vectorization pass(es) to understand the
underlying target capabilities. The same will hold true for various
kinds of loop iteration-space transformations.

As I recall, Chris suggested to me some months ago the following
work-around: allow optimization passes to access target lowering info
only when it is available. Specifically this means that only for
frontends (like clang) that link in both the optimization passes and
codegen, we would provide some mechanism for providing a TLI instance
to the optimization passes. While I think this could certainly be made
to work, it seems suboptimal. It would mean that 'opt' could no longer
perform the same level of optimization as 'clang' with equivalent
inputs. That being the case, I think that over time 'opt' would simply
fall out of use. My general question is this: What do we gain by
keeping a strict separation between the
(mostly-target-independent) optimization layer and the codegen layer?

To partially answer my own question, I can think of one advantage: It
keeps us from being lazy. Specifically, it forces us to keep a single
canonical expression form that is handed to the backends. The eases the
maintenance burden by forcing a certain amount of generality into the
whole system and by limiting target-specific variants of the
canonical expression forms. This makes it harder to break things in odd
ways with seemingly-innocuous changes.

I fear, however, that this leads to a system which is generally
good, but not great on any particular target. Furthermore, it is
sometimes very difficult or impossible for the backends to undo bad
decisions made by the target-independent optimization layer. I think
it is time to reconsider this separation and make optimization a truly
target-dependent process where needed. Obviously, we should not make
target-dependent decisions where they're not necessary, and we should
introduce appropriate abstraction layers to characterize target
differences. Nevertheless, the most efficient and maintainable way to
provide target information to the optimization passes will be to
provide that information directly from the backend code (and
associated tablegen files).

I would like to hear other opinions on this.

Thanks again,
Hal

Hal-

I generally agree with what you are saying here. Based on my recent experience with working on a partial-simdizer (not llvm) I found that even to decide which instructions to group for good simdization requires some knowledge of the underlying target.

Lets take an instruction like haddps which adds all the components of a vector register in a certain way. Whether such an instruction is supported by the target does impact your simdization choice. Furthermore, the cost of haddps may also decide how/where to simdize. Hence the simdization-choice phase which should (theoretically) be fairly target-independent needs to have some knowledge of the target. Now whether this can be abstracted way in some form can be discussed.

-Dibyendu

seems suboptimal. It would mean that 'opt' could no longer perform the same level of optimization as
'clang' with equivalent inputs. That being the case, I think that over time 'opt' would simply fall out of
use. My general question is this: What do we gain by keeping a strict separation between the
(mostly-target-independent) optimization layer and the codegen layer?

I was under the impression that opt would also be able to benefit from the added capabilities. After all, opt is just a driver, and can be taught to understand the 'march' 'mcpu' flags.
I agree that it is important to allow opt to access the TLI for two reasons: 1. We use opt to test our code. 2. Vectorizers may want to service domain specific languages which may not necessarily use clang.

I fear, however, that this leads to a system which is generally good, but not great on any particular
target. Furthermore, it is sometimes very difficult or impossible for the backends to undo bad
decisions made by the target-independent optimization layer.

Yes, but this is a general compiler problem. Early optimizations have no knowledge of how they affects later stages. For example, we don't consider register pressure when we inline a function. The problem is even more severe with vectorizing compilers. One problem that I mentioned in the past was that on 64bit systems, 32bit scalars are promoted into 64bit numbers. Later on, the vectorizer attempts to vectorize this value, but the problem is, that it is much more difficult to vectorize vectors of i64s. For example, array indices which were i32 values are now vectors of i64s, which can't be used for scatter/gather operations (which use i32 indices).