[RFC] Proposal for a high-level ML dialect in MLIR

One critical point IMHO is that it is easy to work on concurrent components/dialects out of the tree but then it is hard when we need to mainline a shared asset between concurrents and more or less partially overlapping proposals.

So, now I’m really confused about what folks are trying to build here. And maybe we need more clarity on the scope and goals.

Based on the description in the OP, and the draft that the Cruise folks put together, it feels like the Cruise folks basically want an “LLVM-first, modernized MHLO” (please correct me if I’m wrong – not intending to put words in people’s mouths). And I think there are a lot of people in the community that would benefit from that.

I think something more Linalg-aligned is another possibility, and feels like what you are alluding to, Renato as being “lower than MHLO but higher than Linalg”. This could look like tiled HLO + linalg ‘primitives’. This is another community need, and covers ground that the Torch-MLIR TMTensor dialect and IREE LinalgExt dialect have needed for putting “stuff at a higher level than raw linalg structured ops, but still mainly aligned with destination passing style, tiling interfaces, etc.”

I think those are two valid use cases (and there may be others). But we need to know what we are building and whose problems are solved by what.

Hi @theadactyl,

Much of this epic thread has been discussing what it takes to get something into the LLVM monorepo, and this is well specified by the LLVM Developer Policy. I am not sure what the OpenXLA plans are, but I assume the thoughts are not to use LLVM processes for the definition of the IR itself - which is what would be required for it to be in the monorepo.

I think that Mehdi’s points above are structurally correct. If you forget about label/branding, MHLO has plenty of good ideas - as does TOSA and ONNX and Glow and Caffe2 and many other IRs and dialects. You don’t have to reinvent everything, you can pick and choose which ideas you like to bootstrap an effort and reduce needless search over a vast design space.

That said, everyone wants a universal IR but I still think you have structurally the same set of problems that led to LLVM IR’s design in the first place, quoting myself:

A number of attempts have been made to make a unified, generic, intermediate representation.
The goal of these projects has been to reduce the amount of effort required to create a new language
or microprocessor. These projects have largely failed, ranging from the original UNiversal Computer
Oriented Language [41] (UNCOL), which was discussed but never implemented, to the more recent
Architecture and language Neutral Distribution Format [12] (ANDF), which was implemented but
ultimately failed.

LLVM succeeded as well as it did for CPUs because it took a principled approach to solving the problem and said “no” to many things that other projects before it tried to solve.

Fast forward to today, it is clear people want a unified IR, but it isn’t clear to me that you all have a common design in mind. Such a thing is REALLY REALLY HARD to define and is loaded with tradeoffs that will be difficult to balance given a wide range of community concerns. What are the principles that guide its design, and what are you saying “no” to?

As one example, I suspect that many folks in this community care about inference accelerators, DSPs with weird constraints etc. All of this makes the specific details on how quantized operations are modelled emerge as something that is quite important. Similarly, numerics, many people presumably care about emerging formats like float8 which are all a bit different - how will that be handled? What is (for example) the MHLO approach to handling this? Is it good enough for all the folks who are interested in this on this thread?

Instead of throwing together some dialects, I’d suggest starting by writing a design doc and see if you can get consensus on the core design tradeoffs.

-Chris

2 Likes

Exactly. I think the problem is that we’re all looking at the problem from our own points of view, which are all very different. The problem space is so vast that we get incomplete pictures. The discussions are still high-level enough, that we might be agreeing on different things.

When we spoke to @raghavanr et al, my impression is that they’re trying to build an optimisation dialect that is not an ingress dialect (ie. not MHLO nor Torch). After speaking with @nicolasvasilache I realised that StableHLO is more like what I had in mind than MHLO, so to me, that’s what our group was aiming for.

If people want to build a higher level dialect, aiming at “modernising MHLO”, then we’ll probably look elsewhere for our transformations that are currently done at linalg+affine+scf level with some extra ops.

(nothing wrong with that, just isn’t what I was hoping, so please, carry on).

Renato, can you talk more about your use case? I see the words “linalg+affine+scf” in the equation, but also things like “StableHLO is more like what I had in mind”. Those seem contradictory (StableHLO lowers to MHLO, and MHLO lowers towards linalg – you can’t have something that is both on the StableHLO and linalg side of MHLO).

I see, I seem to have gotten that the wrong way around, and that’s probably the source of my confusion. Now I can see why we’re not talking about the same things.

Linalg is great for tiling and fusing (if the ops match ranges), but affine/scf is required for blocking (reshape for locality), reordering loops and finding the right level for a library call.

Even though linalg has a library name attribute, we don’t have functions for the whole op (full conv, fully-connected) but we do have low level “instructions” (aka smaller composable library calls) that we know is efficient when called in a certain order. This is what we call Tensor Processing Primitives. Think TPP as a CISC instruction in the sea of (MLIR) RISC instructions.

Our top-shelf op, the batch-reduced GEMM, is super efficient for a bunch of ops bundled together (batching, GEMM, reducing, even activation afterwards), so we want to fuse the MLIR ops like tcp.group or scf.execute_region do, which will then, be a library call. But we need to know in which sub-tensor we’re applying the function to (to guarantee the tile shape is consistent with the rest of the iteration space, and to be able to do later fusion after re-order), and scf.execute_region doesn’t make that easy.

For now, we’re massaging the IR to get it in the shape we want, because our goal is the optimisation passes, not IR design. But that’s not a realistic long-term goal, so we’re also interested in upstream dialects, even if they’re not LLVM proper, to see two things: first, how we can reduce the IR massaging we have to do; second, to understand what front-ends generate, to be able to consume that directly.

Having an common intermediate dialect that other front-end ingress dialects lower to would be awesome as a starting point. Having an upstream (read LLVM/MLIR) dialect that allows us to work at a slightly higher level than Linalg (for example, keeping relu instead of lowering to ops) at the same time we still have linalg and affine, scf, would allow us to use the right level for the right transformations, before lowering to calls+codegen.

Those two “meta-dialects” (common-from-ingress and transformation) don’t have to be the same, not even be only two. We can work with lower IR (even raise it a bit again, when needed) for the time being (or even forever) if the community needs something completely different.

I still read:

The longer term roadmap for MLIR is to provide a Tensor Compute Primitive (TCP) dialect, which should hopefully be general enough to model what HLO represents today

I don’t know if in 2022 this is still a valid goal (whatever name we want to give to TCP) or if this claim was part of the original MLIR proposal when it migrated to the LLVM foundation/repo.

Also just analyzing the main MHLO consumers/bridges currently we have:

  • Onnx-MHLO bridge under LF governance (the same umbrella org as Pytorch now)
  • Pytorch-MLIR bridge under LLVM governance (incubator).
  • Tensorflow MHLO in TF core Google repository (Google governance)

All these projects will have a dependency over OpenXLA repo (currently on TF/mlir-hlo)

Then we have MHLO itself:

  • It will land on the new OpenXLA governance. At the same time we still have an (old?) claim in the repo Readme about upstreaming it in MLIR as TCP

  • StableHLO under the same OpenXLA governance that it will support bidirectional conversion with MHLO and eventually TCP:

Create a bidirectional conversion between StableHLO and MHLO · Issue #11 · openxla/stablehlo · GitHub

Create a bidirectional conversion between StableHLO and TCP · Issue #17 · openxla/stablehlo · GitHub

And then we have this big open topic about building a common vision on the nature of TCP and how and where it need to be contributed clarifying eventually its MHLO affinities (roots?).

I don’t know if all this is linear but I hope to have correctly photographed the current situation.

Also some other transitional notes:

  • Jax directly emits MHLO as its native opset (will transition to StableHLO)
  • Torch-XLA (seeks to) interop via it for CloudTPU/et-al interop
  • Various other parties directly integrate it as part of their compiler flows (IREE is one and I know of others)
  • Tensorflow will transition to emitting StableHLO (vs direct use of MHLO)

To some extent, this is mutable: depending on what happens here and the adjacent projects, we can adapt this. I’m not going to speculate on how right now but will just state that we want MHLO to be coherent with what happens broadly (incl. here) and are watching/open to feedback as things evolve.

This is the current snapshot of the POR. Just noting as above that the actual/eventual alignment and governance will be setup to be congruent with what happens here and community demand. This is in the “gotta start somewhere” category and so long as some of Google’s requirements are met in the end, we are quite open to adapting the plan as a consensus emerges.

Thanks for summarizing. Seems about right. My notes above are primarily to make sure that the community understands that Google is quite open to adapting any of the current state as a consensus emerges – we view this as an opportunity for evolution vs a fixed point that the community should optimize or tip-toe around.

Thank you for the integration,
I suppose we could consider Torch-XLA as it is currently in the pytorch Org namespace it will be under the future LF umbrella governance with eventually a dependency over assets under the OpenXLA governance.

For the other bridges that will go to target StableHLO (under the OpenXLA governance) they could use it as a routing pivot through MHLO or TCP (wherever it will eventually land or if there will be a fusion/large overlapping of these dialects).

That is not claimed there. As you quoted there is a claim that there is a long term goal to have a dialect upstream that will be able to represent the same/subsume it. And as Mehdi mentioned the goal was as a community effort with multiple interests and ingestions/egresses to avoid just hard coding the same assumptions for the same platforms again. Collective ownership and active collaboration rather than just another high level dialect.

That’s what this conversation is about.

I think we almost need to work on glossary to create common language here first :slight_smile: Noticed this way up thread already that we are talking about many different things, folks responding from “one conversation” on the other while assuming in the previous, etc.

Having the focus on codegen side and as target for input dialects with a focus on transformation or some such goal statement helps.

Sean also pointed to good other proposals at this level which is appealing and good collaboration opportunity.

(Was OOO the last couple of weeks so haven’t looked at proposal doc yet)

The longer term roadmap for MLIR is to provide (TCP) dialect which should hopefully be general enough to model what HLO represents today

Quoting it again I still think that the claim was quite direct but I didn’t mean that it is graved in the stone.
Then it still matter as “start TCP from MLHO or grounding it from MLHO” it is an expressed position in this thread.

It’s always hard to argue the intent of a claim with one of the authors of it :slight_smile: We should treat most of that readme as a point in time aspiration. It has some good ideas but I expect this conversation and its successors will end up subsuming it in the end (and hopefully fulfilling the vision in some form, if not the exact path described).

Yes, but the “aspiration” materializes when some positions emerge in the thread on the possible relationship, overlap etc … of TCP with MHLO.

In the end we have a concrete initial spec on Gdocs that has already collected/absorbed many comments.

I think that if all the people directly involved in MHLO will give a concrete contribution to the spec in the aim of the aspirational MHLO Readme I think it could be a good thing to go ahead.

No one today can guarantee a sure convergence on a common vision.

But with a collaborative spirit and allocated resources (and attention) I don’t see it as a lost bet regardless.

I think that no one has had problems working on the GDOC so it would be interesting to understand on which repository all the people involved in the GDOC are willing to work.

E.g. “I will not start to follow Torch-MLIR rep activities just for TCP - I will not going to review PR there” or “I will follow/subscribe to the activities in a new LLVM incubator” etc…

Just my 2¢

One way to express this fusion would be to have a fused operation (or a custom call/ffi style abstraction) inserted fairly early and then have this fused operation travel through the system. Would that fit your requirements?

I am looking at this from the perspective of decomposing code-generation steps (like tiling, lowering to buffers, …) via interfaces, so that one can add new operations to the system while staying within the general optimization framework. The work on thlo (and gml_st) follows this idea (including decomposing fusion via a special operation). In essence, it has the notion of a tile as a first-order concept, some loop constructs that produce tensors out of tiles (potentially nested) and a way to express where tiles of operands are needed.

The hope is that, composed with the various upstream (and eventually upstream) interfaces for destination passing style, tiling, bufferization, etc., this approach will allow to define the codegeneration mechanics without nailing down the specific opset.

This is heavily based on/inspired by and composes with the work on linalg but allows to retain some higher-level properties at a level below mhlo (e.g. after tiling).

To bring this back into the context of this discussion: I think one needs (at least) three levels of dialects and the discussion on TCP is mixing the three. Something akin to TOSA or StableHLO which is suitable to serve as an entry into compilers/execution systems, something like mhlo that then can model details of execution (like partitioning, mapping to libraries, exploiting numerical properties, reasoning about fusions, etc.) and a third level, like thlo + linalg + scf or gml_st for transformations like tiling, bufferization, etc…

Everything below is not ML specific anymore.

2 Likes

Have you identified additional constructs that you would need in addition to than scf.for, scf.foreach_thread and the various flavors of tensor.extract/insert ops?
One aspect that seems a bit contradictory to me is that you seem to both want to group untiled ops and have them tiled in IR.
So … why not just tile them and have them grouped within the looping constructs?

Yes, structured codegen composes nicely and is retargetable to many scenarios :slight_smile:

Historically, in the very very early days of structured codegen a single dialect would contain all the ops.
While this was useful for bootstrapping, as work progressed, orthogonal abstractions emerged and have been sliced out diligently. This includes properly defining op semantics and setting up all the canonicalizations and foldings, without which nothing composes.

I recommend slicing out the abstractions that turn out to be useful as their own thing and propose them independently as they become mature enough. However, I also well understand the value of fast experimentation and prototyping.

It seems that finally TCP is bootstrapping in the Torch-mlir LLVM incubator repo:

I have two questions:

  • I see that the reviewers/commenters list on the PR it is quite restricted if we compare with the list of people joined in the Google Docs review. Is it just a coincidence?

  • Will TCP topics be discusses in the Torch-MLIR ODM or MLIR ODM?

That PR is only to add the boilerplate code needed to get TCP started in TorchMLIR. It only contains a dummy op. An earlier draft version of the PR had 2 ops, but those have been removed now, just to get the boilerplate code in.

We will post all TCP related technical discussions in the TCP-WG category. If there is any specific discussion that warrants a f2f, we’d prefer to do that in the MLIR ODM, so that it is available to everyone in MLIR (not just TorchMLIR folks).

For PRs, we will post them in the mlir-tcp discord channel, so that anyone interested can review them.

3 Likes