How to extend linalg ops without modifying MLIR code?

I want to extend the linalg op without modifying the MLIR code to reuse the transform capability in linalg. For example: I want to customize linalg.globalAvgpool(linalg.pooling_sum+linalg.div), and customize its getTiledImplementation() and generateResultTileValue(). That is I want to tile and fuse the op at a higher level (a combination of linalg ops, treat it as a whole and ignore its calculations) without first converting it to linalg.pooling_sum+linalg.div , and then proceed to tile and fuse. To give another linalg.softmax example: I want to tile and fuse the softmax and its producers and consumers directly, and treat the softmax as a whole, ignoring its actual computation.

This is under discussion and very much open how we’ll do it. If you have an implementation of soft max tiling, then I suggest you share it upstream.

We’re also looking into bundles of named ops as one meta named op. As long as all ops inside are tilable, the meta op can be tilable too. (possibly valid for other interfaces, but that’ll depend on what they do).

This is all still speculative. If you have a concrete implementation, sharing it would be a good first step in the design.

Okay, thank you very much for your answer. I am still in the initial trial stage, and I will share my implementation later. I would be more than happy to participate in discussions and sharing on this topic.

Hi, so far this has been points made in random discussions, forum and conferences. We’ve been dancing around the idea for over a year, and most people feel sympathetic to it, but not quite to create one.

So, we started by adding named operations to linalg, which will make it clear about the need for a grouping. Next we’ll add a group op.

The main controversies around the topic are:

  • Is it isolated from above or not? If it is, then why not just outline into a function?
  • Why not just use a generic? For non perfectly nested ops, can we even safely define semantics?
  • Does it propagate interfaces through? Should it require that all ops have the same interfaces?
  • If all ops implement an interface, does that mean the group does, too? Are all interfaces composable that way? If not, how do we describe composability?
  • Can I merge any two groups together? Split them apart?

Our plan (Intel) is to finish the named ops to the point we can lower models to it (including softmax) and try to fuse with producers and consumers. Then we’ll try models with things like GeLU, LayerNorm, as composites, then we’ll try to group them.

We wanted softmax to be such a group in the first place, so perhaps we can make that case.

We’re also interested in grouping for sharding (distribution) and device placement (compute follows data) approaches, but those are secondary goals at the moment.

1 Like

Got it, thank you!

We need to extend linalg ops outside the mlir code too. After reading the posts, I’m still confused about how to implment it.

Although adding an extension dialect may be a bit silly, if there is a real need, it may be a usable solution. For example, you can refer to the LinalgExt dialect in IREE to meet this requirement.

Thank you, I have seen simlir solution in byteir. It seems we can add new ops for transform dialect outside the mlir as shown in this. What makes Linalg dialect different?

In my opinion, Linalg in MLIR does not currently support the combination of multiple perfect loop nests (such as layernorm, global_average_pooling and other operators), because these are still under discussion. However, a lot of practical work needs to be moved forward, so we are trying to add a LinalgExt dialect to advance normal work. However, this method is just a helpless step at this stage. LinalgExt may disappear in the future, or LinalgExt should not exist.

If we just use Linalg named op to do bufferize and lower from tosa, and then lower linalg named op using self define dialect like triton, can we add new named linalg ops outside mlir without defining a new linalgext dialect?

If it is just a new named op, why not propose an RFC and merge it directly into LLVM?

Thank you, we will try. As metioned above, what we want is just the bufferize functional of linalg dialect op, which is a little tricky, so it is not very suitable for proposing an RFC.