Future-proof syntax for dataflow operators?

Hello,

A few weeks ago I have presented in the MLIR Open Meeting our work on adding dataflow programming to MLIR. We are currently improving our (open-source) prototype to allow an easier use before trying to convince you that it should be upstreamed in some way because it facilitates the representation and compilation of embedded ML and signal processing algorithms.

Recall that our work allows mixing in a seamless way data processing operations (classical MLIR) with dataflow operators. One part of our improvement work is stabilizing a syntax for dataflow-specific constructs that is both:

  • not very cumbersome
  • not very difficult to maintain

Recall that in addition to classical (MLIR) functions, our dialect allows the definition of stateful dataflow nodes with a cyclic execution model. For instance:

lus.node @lstm(%x:tensor<1x40xf32>,%out: i1) -> (%y:tensor<1x4xf32>) 
     clock { lus.on_clock ((base,base) -> (base on %out)) } {...}

Here, the first line looks roughly like a classical function signature. The main difference is that outputs are named, meaning that they are stored as inputs by the Operation-derived class. This is the most natural way we found to allow naming the outputs to allow referring to them by name later.

Why do we need to refer to them by name? Well, to allow checking the correctness of the dataflow. In a cyclic execution model, one needs to determine in which cycles a dataflow arc is produced (and consumed). We use a declarative mechanism to do this, in line 2 of my code fragment. Here, I specify for each input and output when it is consumed or produced by the node. Both inputs are consumed at every execution cycle of the node, which is specified by the keyword base. The output is consumed in every cycle where %out is true, which is specified by the expression base on %out.

This whole specification is part of the operation lus.on_clock which is part of the region after keyword clock (if the keyword is not present, either the body of the node allows inferring the needed information through a typing process or, if the body is absent, assuming that all inputs and outputs are produced at each cycle). Note that the region introduced by clock does not have a terminator. We achieved this through the parser and printer. The terminator only exists internally, and it is the printer of the lus.on_clock operation that prints the body of the region (on only one level).

My first question here: are these choices reasonable, or are there better methods to achieve the same (or a similar) result. By better, I mean easier to maintain or less cumbersome as a syntax.

Best,
Dumitru

My masters dissertation was a MLIR dialect for dataflow. There, I handled this by using custom attributes instead of function signatures.
Here’s the PDF

Seems like you want to encode relationship between operands and results, couldn’t it be stored an indexed attribute array instead of a region?

Hello Mehdi,

Thanks for your reply.

The problem with attributes, as far as we understand them, is that you cannot refer to variables inside them. We would have preferred to use attributes, of course, but at the same time we needed to be able, in the type of one variable, to refer other variables of the signature, in order to say that the current variable is consumed or produced only when the other is true.

Another solution to this problem would have been to use some positional encoding (e.g. refer to a variable by its position in the input or output list). We have tried a bit this syntax, and concluded that it’s not natural, so we exploited the only mechanism we know allowing us to refer to variables by name.

As far as our tooling goes, this approach seems to work well.
Does this pose a some problems we did not see?

@jabcross : You seem to represent SDF graphs. Even when considering more complex SDF-like formalisms (e.g. CSDF) you don’t need to refer to other variables, just to numerical rates or activation words. This is why attributes were enough for you (but are not for us). You can find our paper here: https://dl.acm.org/doi/10.1145/3506706

If the question is why would one need more general control than (C)SDF, the answer is simple: because data-dependent control is needed at higher levels in ML. For instance, in RL triggering training cycles is done when a label is present (which is data-dependent). In gated experts data-dependent behavior is also needed. And once you start using it, you see that it’s easier to represent lots of behaviors in a modular way. For instance, stopping NN layers (or pre-/post-processing code) that aliment outputs that you don’t need at a given moment.

There is also a difference in objective with our DF modeling. It does not aim mainly at allowing the scheduling of one model. Instead, it aims at providing a natural specification framework for ML algorihmic aspects ranging from layer description to training and RL.

Dumitru

@qaco @albertcohen

2 Likes

I ended up creating a custom attribute that associates a name with a type, and I just refer to those in my SDF operations. To create a variable you would need to create a specific type and operation to refer to that value, which I agree would be a lot of work.

It would be definitely handy to have support for named runtime parameters in operations. (Maybe have MLIR automatically sort them alphabetically to keep compatibility with ordered parameters?)

@jabcross So you made the association by name? Or by ad-hoc types that in fact represent names? We thought about the name-based approach, but (if I recall well) MLIR authorizes variable name changes. So legal MLIR transformations would not preserve the semantics of our code. For the type-based approach, there are some aspects I don’t quite grasp. Can you explain it more? I also did not understand the remark on named runtime parameters.

I use an edge operation to make the connections, using only attributes (SymbolName attributes to refer to nodes, and my custom PortAttr to refer to which ports). No sequential encoding or parameters

If you’re referring to changing of SymbolNames (prefixed with @), I did not know about this name change behavior. It wasn’t an issue I ran into, as the first pass I do is my own scheduler. The names for values (prefixed with %) I didn’t use.

The type-based approach is something I considered but ended up giving up on. I wanted to use MLIR’s built-in type checking but I found it easier to do it myself. It would be something similar to my “edge” operation, but using typed parameters as well as attributes. A node’s ports would be represented as values and would be returned by the node declaration operation. If you wanted not to rely on the order of the outputs, you could create an outputof operation that returns the port value of a given node, given the port name as an attribute.

What I’m proposing is an extension of MLIR syntax to allow for named parameters in signatures or named fields in values, which would be useful to avoid these workarounds. Something like a built-in struct abstraction.

I suppose you could do something similar to my Edge operation, but take a parameter instead of the input attribute if you need to refer to something at runtime.

Yes this is what I was suggesting actually. Note that I don’t see it as “syntax” as the textual format could still use names somehow, but this changes the internal representation.
What did you find difficult with this?