Introduction
Following today’s discussion, it seems like we need to commit to a representation of pipelined loops, which can be targetted from higher-level dialects, and be used to generate a FSMD representation.
I propose we evolve staticlogic.pipeline
to fit our needs. If that ultimately aligns with the needs of software pipelining as outlined in Better modelling of pipeline loops, we can talk about upstreaming this. For now, it seems prudent to incubate the representation in CIRCT.
So, what are our needs? To drill in a bit, I think there are a few things:
- Represent the induction variable of an
scf.for
or the condition variable ofscf.while
and how it updates - Represent information computed by scheduling in the IR, like initiation interval and start times of operations
- Enable the generation of a FSMD
What else?
Calyx representation
For me, it helps to think bottom up, starting from what is required to enable the generation of a FSMD. I’m going to assume we are talking in terms of Calyx, which represents the datapath with wires and groups, and the FSM with a control language that enables different groups.
It seems like there are two major needs: FSMD for the looping constructs and FSMD for the stages of the pipeline, which access the looping constructs. I’m trying to focus on what the control language should look like. In Calyx psuedo-IR, I’m picturing something like:
while %loop_construct.condition with @LoopGroup
parallel
enable @LoopGroup
if %input.ready with @InputGroup
enable @Stage1Group
if %stage1.done with @Stage1Group
enable @Stage2Group
if %stage2.done with @Stage2Group
enable @OutputGroup
Does that make sense? I’m curious if there is a good idiom for representing a pipeline in Calyx.
To support such an FSM, we’d need datapath components for:
- Computing the next values of looping constructs, done conditions, etc.
- Computing each stage of the pipeline
Pipeline stages
From scheduling, we can get an initiation interval, and the start times of different operations. Operations with the same start time are in the same stage.
We already have discussed a couple options for capturing the computation stages: either with a sequence of blocks (staticlogic.pipeline
), or with a sequence of region-carrying ops (scf.pipeline_for
). I think either option can make sense.
One thing I’m considering is how to pass values between stages (i.e. where to put pipeline registers). In the current staticlogic.pipeline
, this is implicit in the values flowing between each block. In scf.pipeline_for
, this is made explicit in the values yielded and returned by each stage. An in-between approach could be to define a new operation that is a block terminator, which passes values to the next block:
staticlogic.pipeline
bb0:
%0 = std.addi %1, %2: i32
staticlogic.yield %1 : i32
bb1(%arg0: i32) // %arg0 is passed from %1
...
Would that help analysis and lowering to FSMD?
Loop conditions and variables
Another thing I’m interested in is the interaction between looping constructs like induction variables and the stages. I’m imagining a component that computes the loop variable updates and done conditions, and feeds the loop variables into the first stage. The value of the loop variable is carried down through the stages, so each stage can access the appropriate value of the induction variable at the right time.
This is something I liked about the scf.pipeline_for
representation: it passes the induction variable to each stage as block arguments, before the input values:
%0 = scf.pipeline_stage {
^bb0(%iv : index, %arg1 : f32):
Whether or not we use a wrapping op like scf.pipeline_stage
, or use a flat list of blocks inside staticlogic.pipeline
directly (possibly with the proposed staticlogic.yield
), it seems like having the current induction variable passed as a basic block argument makes sense.
Besides how to pass them around, I’m also thinking about how to represent the computation of loop variables. The scf.pipeline_for
proposal was specific to for-loops, and captures the lower bound, upper bound, and step in op itself. I’m also thinking about scf.while
(which scf.for
can be converted to), and may be needed for certain frontend use-cases. I think capturing the loop variable computation in the op, like scf.pipeline_for
, is a good thing, but I’m looking towards a little more generality, like scf.while
.
Perhaps we can add this onto staticlogic.pipeline
, similar to how scf.while
does it. We could have a region to compute the next value of the condition, and a region to compute any other “looping variables”, like the induction variable in a for-loop, and present them to the pipeline stages. I’m hoping this would capture the information needed to generate the appropriate Calyx component(s).
Draft proposal
Given the above, I’m thinking of how we can update staticlogic.pipeline
.
Here’s a draft. Let’s say we have this loop, with scheduling information in parentheses:
scf.for %i = %c0 to %c10 step %c1 { (II = 1)
%0 = memref.load %memory1[%i] (startTime = 0)
%1 = std.addi %0, %c1 (startTime = 1)
memref.store %1, %memory2[%i] (startTime = 2)
}
Here’s one point in the design space:
staticlogic.pipeline {II = 1} {
condition(%done = %c0) { // %c0 is an operand, the initial value. %done is a block argument.
%0 = std.icmp lt %done, %c10
staticlogic.yield %0
}
update(%i = %c0) { // %c0 is an operand, the initial value. %i is a block argument.
%0 = std.addi %i, %c1
staticlogic.yield %0 // each value yielded here corresponds to a block argument in the stages
}
stages() { // any inputs could be accepted as operands to stages
^bb0(%i0): // %i0 is yielded by the update block, and any other block args come from the operands to stages (to keep staticlogic.pipeline IsolatedFromAbove)
%0 = memref.load %memory1[%i]
staticlogic.yield %i0, %0
^bb1(%i1, %arg0): // the arguments are now fed through the previous yield
%1 = std.addi %arg0, %c1
staticlogic.yield %i1, %1
^bb2(%i2, %arg1):
memref.store %arg1, %memory2[%i2]
staticlogic.return // pipeline outputs could be returned here
}
}
Does this representation capture the ingredients for a FSMD? What’s missing?
Other thoughts
Some random other thoughts I’ve had that aren’t addressed by the above draft:
- How to pass around loop-carried values? I think this can be captured similarly to
scf.for
, etc., usingiter_args
onstaticlogic.pipeline
, alongsidecondition
,update
, andstages
. The values from thestaticlogic.return
could be made available asiter_args
. - How to represent nested loops? I think it should be possible to cram more complex logic into the
condition
andupdate
regions, to update multiple loop induction variables. Control flow could be represented at this level usingstd.select
to implement mux-like behavior, or we could rely on CFG operations likecond_br
. - Does it make sense to explicitly yield induction variables from one stage to the next? Is there a meaningful computation we’d want to do on them, in order to yield something to the next stage other than the value passed in? That could enable a degree of flexibility. If not, we could make the passing of the first block argument implicit, as is done in the
scf.pipeline_for
proposal.