Separate Pipeline While and Pipeline Pipeline into different dialects
- The current pipeline dialect contains two different levels of abstraction
- Pipeline.pipeline
- Represents a scheduled linear pipeline
- RTL-like level of abstraction
- Pipeline.while
- Represents a scheduled loop pipeline
- HLS-like level of abstraction
- Pipeline.pipeline
- Essentially no shared code between the two levels of abstraction:
- Could potentially lower from pipeline.while to pipeline.pipeline. Current way of sharing dialect between two levels of abstraction that have a clear lowering direction doesn’t match the normal structure of MLIR dialects
- Should be separated out into two dialects, one that focuses on the RTL-like retimeable pipeline and one that represents HLS-like (but not only HLS) loop scheduling
- A simplified pipeline dialect opens up the ability to introduce filament-like (paper, documentation) type checking for RTL pipelines
Combine Pipeline While with a new representation for unpipelined loops to produce the LoopSchedule Dialect
- Enables the representation of pipelined loops nested inside unpipelined loops (common for many machine learning workloads)
- Enables the representation of a sequence of pipelined or unpipelined loops
- Can also enable loops to be mixed with other basic op types (add, mul) and function calls
- Only need two more ops to achieve this: SeqWhile (each iteration happens sequentially) and Step
- Total of 4 main ops in the dialect with PipelineWhile and Stage
SeqWhile and Step Rationale
Both of these operations are heavily inspired by the Pipeline dialect, but tailored for their application in unpipelined scheduling. SeqWhile represents an unpipelined loop, meaning that the II of the loop is equivalent to the loop body latency (i.e. we only start a new input once the entire loop body has completed). Step represents a control step in the schedule. The operations in a control step run in parallel and each control step at the same nest level of the IR run sequentially.
To illustrate this idea further, we will look at an example scheduled vadd design:
func.func @vadd(%arg0: memref<8xi32>,
%arg1: memref<8xi32>, %arg2: memref<8xi32>) {
ls.step {
%c0 = arith.constant 0 : index
%c8 = arith.constant 8 : index
%c1 = arith.constant 1 : index
ls.seq_while iter_args(%arg3 = %c0) : (index) -> () {
%0 = arith.cmpi slt, %arg3, %c8 : index
ls.register %0 : i1
} do {
%0:3 = ls.step {
%2 = arith.addi %arg3, %c1 : index
%3 = memref.load %arg0[%arg3] : memref<8xi32>
%4 = memref.load %arg1[%arg3] : memref<8xi32>
ls.register %2, %3, %4 : index, i32, i32
} : index, i32, i32
%1 = ls.step {
%2 = arith.addi %0#1, %0#2 : i32
ls.register %2 : i32
} : i32
ls.step {
memref.store %1, %arg2[%arg3] : memref<8xi32>
ls.register
}
ls.terminator iter_args(%0#0), results() : (index) -> ()
}
ls.register
}
return
}
Step operations can be nested inside of a function or a SeqWhile op. All of the ops in a given Step are started at the same time and finish when all control flow contained in the step finished. Although not shown in this example, multi-cycle ops start in their given step but we do not wait for their results to be completed before the step finishes. Instead, their values must just be produced before each step that uses that value is run. This allows other steps to run simultaneously with the multi-cycle op.
The difference in semantics between Step and Stage operations is very important because in the general case we cannot know the runtime of a loop (unbounded loops). We would not nest another loop inside of a pipeline, but we absolutely would nest pipelines and other sequential loops inside of a sequential loop.
This small set of additional operations allows us to express a wide range of scheduled programs. Step ops can also support conditional execution of the form ls.step when %0.
Lowering LoopSchedule to Calyx
Although LoopSchedule could be lowered to a number of other dialects, our initial goal is lowering to Calyx. Calyx does a good job at allowing us to describe the kinds of operations we want to represent in LoopSchedule, which makes lowering much easier. PipelineWhile ops can be lowered through Calyx through existing passes and we will add to these passes to allow joint lowering of pipelined and unpipelined loops to Calyx. To lower SeqWhile and Step ops, each core operation (add, mul, etc) is translated into a Calyx group, each Step op is translated into a Par block in Calyx, and each SeqWhile is translated into a while loop. There are a number of edge cases that need to be handled, but the general concept is straightforward.
Why a LoopSchedule Dialect instead of a PipelinedLoop and UnpipelinedLoop Dialects
- Can share a lot of ops such as register and terminator
- Simplifies the mixing of Unpipelined and Pipelined loops
- Lowering passes are more consistent as we cannot lower Pipelined and Unpipelined loops in the same file independently, must happen as one large pass
We would greatly appreciate feedback on this proposal and will be discussing in more detail at the CIRCT ODM meeting on March 22nd.