Hi all, we have been working on an experimental vectorizer infrastructure with the aim of creating a more modular vectorizer that is simpler to extend and test. We would like your thoughts on it. Our goal is to work on this project upstream and maintain it alongside the existing vectorizers, without interfering with them.
Motivation for this work is the increasing complexity of the SLP vectorizer, which started as a fairly simple pass but has now become quite complex as more transformations have been added in, while still maintaining a relatively monolithic design.
This new vectorizer has three main design features:
- Modular: The most important one is that we can split the vectorization pass into smaller sub-passes. Each such sub-pass implements a vectorization transformation, and when all are put together in a pipeline will give us a complete feature-rich vectorizer. Unlike the current SLPVectorizer, we get actual IR after each transformation. This has the benefit that we can test it in isolation with lit tests.
- WYSIWYG Design and Transactional “Sandbox” IR: The main goal is a WYSIWYG (What You See Is What You Get) design where we apply transformations directly on the IR. This helps avoid using separate data-structures for holding the transformed IR state, simplifying the design and making it less error-prone.
This design necessitates a transactional IR (which we named “Sandbox IR”) for reverting the IR state if the transformation is found not to be profitable. Having a separate IR also helps with having more vectorization-specific abstractions. The vectorizer developer works with Sandbox IR only and cannot reach through and access LLVM IR. Sandbox IR is implemented as a layer on top of LLVM IR and is by design always automatically kept in sync with LLVM IR, avoiding unnecessary duplication of state and removing a common source of bugs. - Simple Transformations with Complex Infrastructure: We try to move as much complexity as possible from the vectorization transformations to the infrastructure. This is done in two ways:
(a) By implementing common vectorization multi-instruction patterns as single SandboxIR instructions, like Packs/Unpacks. This not only simplifies the transformations, but also moves the burden of matching/lowering such patterns down to the infrastructure.
(b) By moving the burden of keeping components up-to-date from the vectorizer writer to the pass infrastructure. Components like the DAG and the Scheduler that are being used throughout the vectorizer for legality checks etc., are kept up to date by the infrastructure itself.
We believe that these design features make it easier to develop new transformations and also help with testing as we can test components in isolation.
Vectorizer Overview
- The input and output of the vectorizer is SandboxIR:
- The first step is creating SandboxIR from LLVM IR for the whole function.
- Next we transform SandboxIR with a series of vectorization sub-passes, the first being a simple Bottom-Up vectorizer, and the rest applying vectorization transformations that try to improve the vectorized IR.
- The final sub-pass checks for profitability and either accepts or reverts the IR to its original state.
- Finally, we drop SandboxIR. Please note that there is no lowering phase. SandboxIR is always automatically synced with LLVM IR.
- This pipeline runs for every vectorization region (see SBRegion discussed later) so it will run multiple times within a function.
Sandbox IR
To the pass designer, Sandbox IR is a self-sufficient IR. The input and output of the vectorization passes are SandboxIR, and you have no access to LLVM IR. Internally it is implemented as a thin layer on top of LLVM IR that is always in sync with it. Some of its features are:
- Its IR class hierarchy mirrors LLVM IR, for example:
- Instruction → SBInstruction
- BasicBlock → SBBasicBlock
- Function → SBFunction etc.
- Its API feels like LLVM IR, for example:
cast
/isa
/dyn_cast<SBInstruction>(SBVal)
SBUser::getOperand()
SBValue::replaceAllUsesWith()
etc.
- The vectorizer extends it with some vectorization specific instructions, like SBPack/SBUnpack/SBShuffle, which help simplify the pass.
- Internally Sandbox IR maps directly to LLVM IR. For most instructions there is a one to one correspondence. But Sandbox IR also supports some vectorization-specific instruction patterns that map to more than a single LLVM Instruction. An example of that is the SBPackInstruction that packs multiple values into a vector. It corresponds to a sequence of LLVM InsertElementInst in its simplest form.
Please note that even though SandboxIR is generic enough and could potentially be used by other passes, this is beyond the scope of this proposal.
Sandbox IR Design Principles
- WYSIWYG (What You See Is What You Get)
- Modifying IR modifies the underlying LLVM IR directly, so dumping LLVM IR and reading it back always gives you the exact same state of SandboxIR.
- No replication of LLVM IR state if we can avoid it:
-
Provides use-def chains without replicating the data-structures. It relies on LLVM IR use-def chains internally. So to get the operand of a
SBUser
the infrastructure accesses the corresponding LLVMUser
, gets its LLVMValue
operand through the chain, and then gets the correspondingSBValue
operand, as shown below in steps 1 to 3.
-
Reuse Type and Instruction-specific state (like Predicates, MathFlags etc.). None of it is replicated in Sandbox IR, we are using the state from LLVM IR.
-
- What is replicated:
- The instruction opcode, because one LLVM IR opcode can form more than one type of Sandbox IR Instructions. For example an LLVM ShuffleInst could be either a SBShuffle or a SBPackInstruction, depending on the mask.
- The subclass ID used for
isa<>
,cast<>
etc. - The pointer to SBContext, as we can’t access it through LLVM’s Context pointer.
- Feels-like LLVM IR:
- Similar class hierarchy:
// +- SBArgument +- SBConstant +- SBLoadInstruction
// | | |
// SBValue -+- SBUser ----+- SBInstruction -+- SBStoreInstruction
// | |
// +- SBBasicBlock +- SBPackInstruction
// ... |
- Stays close to the LLVM IR API by providing commonly used functions e.g.,
getOperand(), setOperand(), replaceAllUsesWith()
, etc. - Same
cast<>/isa<>/dyn_cast<>
type system.
Sandbox IR Features
- Provides internal APIs for tight-coupling with other components, i.e., allows tracking instruction movement, creation, deletion through callbacks.
- Even though it is not serializable in its SandboxIR form, the mapping from LLVM IR to SandboxIR (and the reverse) is straightforward so we can use LLVM IR textual form in our tests.
- It tries hard to not provide access to the underlying LLVM IR, aiming at protecting the user from breaking things unintentionally.
- Since it is always synced to LLVM IR, it makes it relatively easy to use some existing Analyses internally and provide wrappers for them.
- It is transactional, providing a save()/accept()/restore() API. In our initial implementation the state is saved at the beginning of the pass pipeline and accepted/restored by a pass at the end. But as we extend the implementation this could be useful within the transformations as well.
Sandbox IR Transactions
A major feature of SandboxIR is that it allows us to save and rollback its state (and the underlying LLVM IR’s).
This was proposed as a feature on LLVM IR ( Checkpointing RFC ), but the scope was too large, so rolling back all state was not always possible. With SandboxIR we reduce the scope, and limit the user’s access to objects that can be tracked. This makes checkpointing feasible within the context of SandboxIR.
A transactional IR has the following advantages:
- It enables simpler designs where you first transform the IR and then decide if it is worth keeping it, instead of operating on abstractions of the IR before lowering them to IR.
- Simpler designs: no need to create/maintain parallel data-structures to operate on
- May lead to more accurate cost-modeling in the future as we can check the cost of actual IR
- Since it already implements the functionality of tracking IR changes, it can be used to enable tight coupling with other components via callbacks.
However, it also has some drawbacks, mainly compile-time and memory usage:
- Compile-time overhead due to SandboxIR indirection.
- Tracking changes for transactions is not free, as for every action performed on the IR we need to create a tracking object and push it into a vector.
- Rolling back the IR state is a linear-time operation to the number of actions performed since we started tracking changes.
We have testing in progress for the overhead of the design choices described in this RFC.
Sandbox Vectorizer’s Pass Manager
The vectorizer implements a simple sub-pass manager, specifically designed with vectorization in mind.
- It operates on SandboxIR
- It supports SBFunction passes, SBBasicBlock passes and SBRegion passes
SBRegion
LLVM’s SLP Vectorization applies a series of transformations on a graph-shaped region of the code. After applying all these transformations it checks for profitability.
To achieve a similar functionality while also maintaining a modular design we introduced the SBRegion, which represents a section of the IR. SBRegions are serializable (meaning that we can describe an SBRegion in a lit test) with the help of LLVM Metadata that simply tag the instructions included in each region.
Current Implementation State
The current implementation is a proof of concept. It includes a basic SandboxIR implementation with a simple transactional scheme and a small set of vectorization-specific instructions. The vectorizer is a very simple bottom-up SLP-style vectorizer along with a couple of simple transformation sub-passes. As this is an early prototype the vectorization transformations are still work in progress and are not currently covering many common vectorization patterns. A snapshot of the code can be found here: Sandbox Vectorizer prototype · vporpo/llvm-project@f167cef · GitHub
Stability wise, it is passing the current llvm/clang lit tests and llvm-test-suite.
We would like to get feedback on the overall approach and design, and to be able to do further development in tree.