I have written a utility that I call clustering out of existing MLIR API calls. I admit the naming may not be the best. The API is as follows:
unsigned Cluster(std::set<Operation > &ops,
cluster_op instruction, / Enum indicating what type the clustering operation is */
The aim of the above function is to take a set of operations (usually a subgraph) present within a block, create a new OP (cluster op) within the block, that has a region and blocks within it, and then move the operations into the new OP.
While moving the ops into the new OP, we also need to re-connect ops that used to connect to the moving ops, correctly as to maintain the connectivity of the IR.
I have the above implemented and working in a number of places in our compiler. I feel I may be missing some terminology here, but is there a inbuilt mechanism within the MLIR core that can perform the above, without me having to write the above implementation ? It feels to me like this could be generic functionality. If not do we see this as useful functionality we could generalise and add to MLIR ?
I think creating a new region op that encloses a range of existing ops is a common pattern. In addition to what Nicolas mentioned, inlineRegionBefore is another utility that can be used to move ops across regions but it falls a bit short for more generic cases and it’s part of the pattern-rewrite infra. Having a more generic and flexible utility would be super useful!
I haven’t thought too much about the implementation of such a utility yet but in my use case I’m trying to go from something like:
What you describe in the example above, is exactly the kind of functionality I have implemented.
I am more than happy to share the implementation, but I need a couple of days to try and clean out some dependencies I have that would prevent from making it generic, and able to compile outside our out of tree project.
I will try to post a patch around the middle of next week.
Related to this, I was fighting the rewriter today as there is no RAUW equivalent.
In my particular case, I was moving a region into another but I was left with some RAUW needs: a value defined outside both ops needed to be replaced by a BBArg of the new op.
As a consequence I had to do a post-processing that explicitly clones with bvm and had fun tracking the self-use that I needed to filter out.
If one of you also has a non-cloning, non BBArgs-replacing RAUW for RewritePattern, I’d be happy to review to plug this hole.
Are the ops contiguous in the block? (This goes towards Nicolas’s suggestion, but I wasn’t sure if that is the case). And is it required that all ops are top most in block? (could one, instead of passing in, use the parent block of head op?) [I’d probably use a predicate function rather than std::set to allow different sets to be used and different clustering approaches, such as clustering based on attribute without populating a set]
@aardeb, just for your reference, I implemented something quickly off-tree to unblock my work here. You can take a look at moveOperationsBefore, getDefsWithExternalUses and the code around their invocations to better understand my use case. Hopefully that helps!
The std::vector<Operation *> &ops do not need to be a linear range but need to be topologically sorted and form a connected subgraph
llvm::iplist::iterator insertClusterOp, is the position in the block where the new op (where the ops will be moved) is inserted into the block.
The code I have is a bit convoluted but works. Happy to receive feedback on how to improve it.
@dcabelle/@jschuhmacher looking through your code it seems that a lot of the complexity in my code may be because I do not assume a linear range of ops and take subgraph, since you code seems to get away with trying to find the external op connections and such. I may also be re-implementing some MLIR internal code leading to a more complex solution.
Thanks for sharing your use cases and implementations! I’m trying to build a bit more expertise on my use case before providing any feedback but my first impression is that I don’t think we can have a single utility that creates the new region op and move all the target operations into the new region, all in one shot. I think that decomposing the transformation into 2 or 3 utilities would make the implementation simpler and more composable.
I’m also wondering why your implementations clone the set of operations to be moved into the new region. Would it be possible to just move them without cloning them? It was possible at least in my use case and I think that would be more efficient since the number of operations to move could be pretty large.
The reason I am cloning is that I can take a arbitrary subgraph of operations to move from the IR and not necessarily a linear range of ops. As we move the operations we need to remap the operands for the ops to respect the connections that come from ops that also move and any connections that after clustering will come in through the block arguments. At the time of writing that code I could not come up with a way to move the operations while reconnecting the operands (in one single step), so cloning seemed the only option. I agree that it may be expensive, so maybe a move and reconfigure operands is needed here. Maybe we can do that as a two stage process, or add the functionality to do it in a single step.
We’re cloning for similar reasons. Also, sometimes an operation from the source region might need to be replicated into multiple destination regions; this we could also achieve with some pre-processing though.