Hello,
A few weeks ago I have presented in the MLIR Open Meeting our work on adding dataflow programming to MLIR. We are currently improving our (open-source) prototype to allow an easier use before trying to convince you that it should be upstreamed in some way because it facilitates the representation and compilation of embedded ML and signal processing algorithms.
Recall that our work allows mixing in a seamless way data processing operations (classical MLIR) with dataflow operators. One part of our improvement work is stabilizing a syntax for dataflow-specific constructs that is both:
- not very cumbersome
- not very difficult to maintain
Recall that in addition to classical (MLIR) functions, our dialect allows the definition of stateful dataflow nodes with a cyclic execution model. For instance:
lus.node @lstm(%x:tensor<1x40xf32>,%out: i1) -> (%y:tensor<1x4xf32>)
clock { lus.on_clock ((base,base) -> (base on %out)) } {...}
Here, the first line looks roughly like a classical function signature. The main difference is that outputs are named, meaning that they are stored as inputs by the Operation-derived class. This is the most natural way we found to allow naming the outputs to allow referring to them by name later.
Why do we need to refer to them by name? Well, to allow checking the correctness of the dataflow. In a cyclic execution model, one needs to determine in which cycles a dataflow arc is produced (and consumed). We use a declarative mechanism to do this, in line 2 of my code fragment. Here, I specify for each input and output when it is consumed or produced by the node. Both inputs are consumed at every execution cycle of the node, which is specified by the keyword base
. The output is consumed in every cycle where %out
is true, which is specified by the expression base on %out
.
This whole specification is part of the operation lus.on_clock
which is part of the region after keyword clock
(if the keyword is not present, either the body of the node allows inferring the needed information through a typing process or, if the body is absent, assuming that all inputs and outputs are produced at each cycle). Note that the region introduced by clock
does not have a terminator. We achieved this through the parser and printer. The terminator only exists internally, and it is the printer of the lus.on_clock
operation that prints the body of the region (on only one level).
My first question here: are these choices reasonable, or are there better methods to achieve the same (or a similar) result. By better, I mean easier to maintain or less cumbersome as a syntax.
Best,
Dumitru