One important next step in turning LLVM into a first-class
autovectorizing compiler will be to incorporate target information into
the vectorization logic. To really make good decisions regarding what
is profitable to vectorize, and how that vectorization should be done,
it will be important for the vectorization pass(es) to understand the
underlying target capabilities. The same will hold true for various
kinds of loop iteration-space transformations.
As I recall, Chris suggested to me some months ago the following
work-around: allow optimization passes to access target lowering info
only when it is available. Specifically this means that only for
frontends (like clang) that link in both the optimization passes and
codegen, we would provide some mechanism for providing a TLI instance
to the optimization passes. While I think this could certainly be made
to work, it seems suboptimal. It would mean that 'opt' could no longer
perform the same level of optimization as 'clang' with equivalent
inputs. That being the case, I think that over time 'opt' would simply
fall out of use. My general question is this: What do we gain by
keeping a strict separation between the
(mostly-target-independent) optimization layer and the codegen layer?
To partially answer my own question, I can think of one advantage: It
keeps us from being lazy. Specifically, it forces us to keep a single
canonical expression form that is handed to the backends. The eases the
maintenance burden by forcing a certain amount of generality into the
whole system and by limiting target-specific variants of the
canonical expression forms. This makes it harder to break things in odd
ways with seemingly-innocuous changes.
I fear, however, that this leads to a system which is generally
good, but not great on any particular target. Furthermore, it is
sometimes very difficult or impossible for the backends to undo bad
decisions made by the target-independent optimization layer. I think
it is time to reconsider this separation and make optimization a truly
target-dependent process where needed. Obviously, we should not make
target-dependent decisions where they're not necessary, and we should
introduce appropriate abstraction layers to characterize target
differences. Nevertheless, the most efficient and maintainable way to
provide target information to the optimization passes will be to
provide that information directly from the backend code (and
associated tablegen files).
I would like to hear other opinions on this.