[RFC] Proposal for a high-level ML dialect in MLIR

I suppose that the point is:

So I’m specifically not advocating actually having TCP be “HLO 2.0” in any branding, project governance, or any other sense.

As Chris said, I think the scope creeps extremely fast if we don’t anchor to a really clear empirical community need. There’s many design considerations where weeks of discussions can be bypassed by “we have N projects using MHLO and this decision seems to work well”. If we don’t make a conscious effort to allow that sort of reasoning, then the timelines are going to drag out a lot.

1 Like

+1 from me on this – this is what I meant by “the long term implications of many of the core design points are well understood”.

Channeling /just my personal opinion/ as someone who cares a lot about LLVM: “we’re willing to wait”. LLVM is >20 years old and I fully intend it to survive another 20+ years: we’re not in a hurry to “claim success” here. It is better to get something later-but-good than something sooner-but-wrong.

ML is a rapidly evolving field and if you need a year or two to get to something that the broader community as a whole can adopt, then that is far better than prematurely trying to standardize on one particular vendor’s prior investment that they are trying to ensconce as a standard. Trying to use LLVM as the torchbearer is something that I’ve seen attempted several times before across the vast story arc of the project, and it has always been best to anchor on what serves a wide variety of stakeholders (or decline to participate until things settle out more).

-Chris

Governance.

Doesn’t fix the issue. OpenXLA/StableHLO governance is still outside LLVM and under the control of a single company, Google. It makes little difference if this is an “open” community or not.

The only equivalent to creating TCP is adopting StableHLO into MLIR proper and let the LLVM community decide its fate.

Putting it another way, creating TCP in a separate repo is just like using an existing dialect instead, but with the problem that it’ll be incomplete and with a fragmented community.

I (personally) don’t see value in having another high-level dialect that isn’t aimed at common optimisations. I don’t think all front-ends can (or want to) agree on a common high-level dialect either.

The original proposal was for an optimisation friendly dialect, lowered FROM the likes of MHLO and Torch, not as a replacement, to sit between those and (coexist with) Linalg. It seems now people are saying MHLO is a replacement for TCP, and I don’t understand the reasoning.

1 Like

Something to clarify if it isn’t clear: StableHLO and MHLO aren’t intended to serve the same purpose. The former is more like TOSA in that it intends to provide a spec, a reference implementation in basic C++, etc. It positions itself as an interface for frontends, and nothing else.
It also exists so that MHLO can stay a compiler IR: it is unstable and its evolution is driven by the need of the compiler internals. We have some vested interest to have it cohabit well with linalg as well since XLA CPU/GPU is adopting linalg for its code generation right now.
So in the world of XLA, MHLO is serving the same purpose as TCP from my point of view: it is enabling optimizations at a slightly higher level than Linalg.

Now MHLO evolution is driven by XLA which is its own project, everything is possible but I still wouldn’t bet on Google to just migrate MHLO and propose it as a dialect in LLVM. For example XLA cares a lot about “bounded shape” and large scale training features.
That said, taking MHLO as a baseline to bootstrap TCP initially does not seem like a bad approach to me: it’s fairly pragmatic and allows us to get started quickly.

Two reasons why we didn’t propose MHLO with the initial TPC proposal (>2y ago, right before the pandemic):

  1. At the time MHLO didn’t have yet dynamic shape support, so some redesign was necessary anyway.
  2. We felt that it would be better for a community driven project to avoid ambiguity by picking a new name and have each new op reviewed one by one. Our intent though was to propose ops from MHLO that aren’t here “just because of history” to TCP, while also discussing the dynamic shape support for each of them.

I feel the current situation isn’t that different, except that dynamic shapes are more mature in MHLO (I didn’t say perfect…) and XLA is accelerating its MLIR adoption and making it a separate project from TensorFlow. It changes the dynamic on the side of the XLA team, but does not make it so that we’re more likely to propose MHLO for upstream. I suspect we will have a vested common interest in having OpInterfaces and general transformation and infrastructure that spans well across TCP and MHLO.

3 Likes

Isn’t this enough, in a true spirit of creating a common asset, to bootstrap TCP?

My only doubt was if all parties are really interested in concretely working on GitHub in a repository like Torch-MLIR whose formal governance however doesn’t seem better defined then e.g. OpenXLA.

Right, exactly why I wasn’t in favor of doing that in torch-mlir (or mlir-hlo). I personally prefer to have a new incubator instead, but I’m not the one doing the hard work, so my opinion has very limited effect.

That’s why I proposed HLO and Torch teams to propose an intersection from their point of view, and then we (optimizers) intersect from ours, and find a common ground. If implementing it in torch-mlir is enough for folks, I can live with that, too.

Looking at TOSA and the current TCP proposal (esp. tcp.group), I think a combination of those would be a good start. If StableHLO is similar (in scope and/or level at least), then it also is a good start.

But neither StableHLO nor TOSA solve the governance problem. I hate to propose yet-another dialect, but I think the governance issue is really important.

For example, when StableHLO becomes a mature dialect, is it going to be included to MLIR upstream like TOSA was? If so, and if they have so much in common, aren’t we duplicating dialects that’ll confuse users and increase maintenance?

The fact that they have their own specs is a good thing (it’s not just a compiler IR), but the fact that those specs are fragmented, isn’t.

Also, my understanding is that TOSA is more about representing the computation semantics (which is great for code generation) than being a transform dialect itself (even though it is that). So the spec really cares about the semantics of each op. If StableHLO follows suit, then we might end up with two dialects with slightly different semantics, probably in edge cases, and the situation won’t have improved.

1 Like

Doesn’t fix the issue. OpenXLA/StableHLO governance is still outside LLVM and under the control of a single company, Google. It makes little difference if this is an “open” community or not.

We are actively building out a governance proposal that would provide a concrete pathway for OpenXLA/StableHLO to evolve to shared technical leadership and code ownership. When you say “control” here, what specifically are you referring to? At a minimum, anyone is free to fork the codebase at any time, regardless of governance.

This is a common fallacy when describing open source governance. Forking doesn’t solve the problem either.

The most important point I made is that both StableHLO and TOSA are separate communities, outside of the LLVM umbrella, with similar goals but different underlying pressure, and if they both want to be accepted standard dialects into MLIR, we’ll have redundancy and high maintenance.

And if we’re also creating TCP, then we now have three different approaches with similar (or at least intercepting) goals, two of them in-tree and one out-of-tree, being driven by three different (but overlapping) communities.

Maybe it’s just me, but that does not look stable long term…

1 Like

One critical point IMHO is that it is easy to work on concurrent components/dialects out of the tree but then it is hard when we need to mainline a shared asset between concurrents and more or less partially overlapping proposals.

So, now I’m really confused about what folks are trying to build here. And maybe we need more clarity on the scope and goals.

Based on the description in the OP, and the draft that the Cruise folks put together, it feels like the Cruise folks basically want an “LLVM-first, modernized MHLO” (please correct me if I’m wrong – not intending to put words in people’s mouths). And I think there are a lot of people in the community that would benefit from that.

I think something more Linalg-aligned is another possibility, and feels like what you are alluding to, Renato as being “lower than MHLO but higher than Linalg”. This could look like tiled HLO + linalg ‘primitives’. This is another community need, and covers ground that the Torch-MLIR TMTensor dialect and IREE LinalgExt dialect have needed for putting “stuff at a higher level than raw linalg structured ops, but still mainly aligned with destination passing style, tiling interfaces, etc.”

I think those are two valid use cases (and there may be others). But we need to know what we are building and whose problems are solved by what.

Hi @theadactyl,

Much of this epic thread has been discussing what it takes to get something into the LLVM monorepo, and this is well specified by the LLVM Developer Policy. I am not sure what the OpenXLA plans are, but I assume the thoughts are not to use LLVM processes for the definition of the IR itself - which is what would be required for it to be in the monorepo.

I think that Mehdi’s points above are structurally correct. If you forget about label/branding, MHLO has plenty of good ideas - as does TOSA and ONNX and Glow and Caffe2 and many other IRs and dialects. You don’t have to reinvent everything, you can pick and choose which ideas you like to bootstrap an effort and reduce needless search over a vast design space.

That said, everyone wants a universal IR but I still think you have structurally the same set of problems that led to LLVM IR’s design in the first place, quoting myself:

A number of attempts have been made to make a unified, generic, intermediate representation.
The goal of these projects has been to reduce the amount of effort required to create a new language
or microprocessor. These projects have largely failed, ranging from the original UNiversal Computer
Oriented Language [41] (UNCOL), which was discussed but never implemented, to the more recent
Architecture and language Neutral Distribution Format [12] (ANDF), which was implemented but
ultimately failed.

LLVM succeeded as well as it did for CPUs because it took a principled approach to solving the problem and said “no” to many things that other projects before it tried to solve.

Fast forward to today, it is clear people want a unified IR, but it isn’t clear to me that you all have a common design in mind. Such a thing is REALLY REALLY HARD to define and is loaded with tradeoffs that will be difficult to balance given a wide range of community concerns. What are the principles that guide its design, and what are you saying “no” to?

As one example, I suspect that many folks in this community care about inference accelerators, DSPs with weird constraints etc. All of this makes the specific details on how quantized operations are modelled emerge as something that is quite important. Similarly, numerics, many people presumably care about emerging formats like float8 which are all a bit different - how will that be handled? What is (for example) the MHLO approach to handling this? Is it good enough for all the folks who are interested in this on this thread?

Instead of throwing together some dialects, I’d suggest starting by writing a design doc and see if you can get consensus on the core design tradeoffs.

-Chris

2 Likes

Exactly. I think the problem is that we’re all looking at the problem from our own points of view, which are all very different. The problem space is so vast that we get incomplete pictures. The discussions are still high-level enough, that we might be agreeing on different things.

When we spoke to @raghavanr et al, my impression is that they’re trying to build an optimisation dialect that is not an ingress dialect (ie. not MHLO nor Torch). After speaking with @nicolasvasilache I realised that StableHLO is more like what I had in mind than MHLO, so to me, that’s what our group was aiming for.

If people want to build a higher level dialect, aiming at “modernising MHLO”, then we’ll probably look elsewhere for our transformations that are currently done at linalg+affine+scf level with some extra ops.

(nothing wrong with that, just isn’t what I was hoping, so please, carry on).

Renato, can you talk more about your use case? I see the words “linalg+affine+scf” in the equation, but also things like “StableHLO is more like what I had in mind”. Those seem contradictory (StableHLO lowers to MHLO, and MHLO lowers towards linalg – you can’t have something that is both on the StableHLO and linalg side of MHLO).

I see, I seem to have gotten that the wrong way around, and that’s probably the source of my confusion. Now I can see why we’re not talking about the same things.

Linalg is great for tiling and fusing (if the ops match ranges), but affine/scf is required for blocking (reshape for locality), reordering loops and finding the right level for a library call.

Even though linalg has a library name attribute, we don’t have functions for the whole op (full conv, fully-connected) but we do have low level “instructions” (aka smaller composable library calls) that we know is efficient when called in a certain order. This is what we call Tensor Processing Primitives. Think TPP as a CISC instruction in the sea of (MLIR) RISC instructions.

Our top-shelf op, the batch-reduced GEMM, is super efficient for a bunch of ops bundled together (batching, GEMM, reducing, even activation afterwards), so we want to fuse the MLIR ops like tcp.group or scf.execute_region do, which will then, be a library call. But we need to know in which sub-tensor we’re applying the function to (to guarantee the tile shape is consistent with the rest of the iteration space, and to be able to do later fusion after re-order), and scf.execute_region doesn’t make that easy.

For now, we’re massaging the IR to get it in the shape we want, because our goal is the optimisation passes, not IR design. But that’s not a realistic long-term goal, so we’re also interested in upstream dialects, even if they’re not LLVM proper, to see two things: first, how we can reduce the IR massaging we have to do; second, to understand what front-ends generate, to be able to consume that directly.

Having an common intermediate dialect that other front-end ingress dialects lower to would be awesome as a starting point. Having an upstream (read LLVM/MLIR) dialect that allows us to work at a slightly higher level than Linalg (for example, keeping relu instead of lowering to ops) at the same time we still have linalg and affine, scf, would allow us to use the right level for the right transformations, before lowering to calls+codegen.

Those two “meta-dialects” (common-from-ingress and transformation) don’t have to be the same, not even be only two. We can work with lower IR (even raise it a bit again, when needed) for the time being (or even forever) if the community needs something completely different.

I still read:

The longer term roadmap for MLIR is to provide a Tensor Compute Primitive (TCP) dialect, which should hopefully be general enough to model what HLO represents today

I don’t know if in 2022 this is still a valid goal (whatever name we want to give to TCP) or if this claim was part of the original MLIR proposal when it migrated to the LLVM foundation/repo.

Also just analyzing the main MHLO consumers/bridges currently we have:

  • Onnx-MHLO bridge under LF governance (the same umbrella org as Pytorch now)
  • Pytorch-MLIR bridge under LLVM governance (incubator).
  • Tensorflow MHLO in TF core Google repository (Google governance)

All these projects will have a dependency over OpenXLA repo (currently on TF/mlir-hlo)

Then we have MHLO itself:

  • It will land on the new OpenXLA governance. At the same time we still have an (old?) claim in the repo Readme about upstreaming it in MLIR as TCP

  • StableHLO under the same OpenXLA governance that it will support bidirectional conversion with MHLO and eventually TCP:

Create a bidirectional conversion between StableHLO and MHLO · Issue #11 · openxla/stablehlo · GitHub

Create a bidirectional conversion between StableHLO and TCP · Issue #17 · openxla/stablehlo · GitHub

And then we have this big open topic about building a common vision on the nature of TCP and how and where it need to be contributed clarifying eventually its MHLO affinities (roots?).

I don’t know if all this is linear but I hope to have correctly photographed the current situation.

Also some other transitional notes:

  • Jax directly emits MHLO as its native opset (will transition to StableHLO)
  • Torch-XLA (seeks to) interop via it for CloudTPU/et-al interop
  • Various other parties directly integrate it as part of their compiler flows (IREE is one and I know of others)
  • Tensorflow will transition to emitting StableHLO (vs direct use of MHLO)

To some extent, this is mutable: depending on what happens here and the adjacent projects, we can adapt this. I’m not going to speculate on how right now but will just state that we want MHLO to be coherent with what happens broadly (incl. here) and are watching/open to feedback as things evolve.

This is the current snapshot of the POR. Just noting as above that the actual/eventual alignment and governance will be setup to be congruent with what happens here and community demand. This is in the “gotta start somewhere” category and so long as some of Google’s requirements are met in the end, we are quite open to adapting the plan as a consensus emerges.

Thanks for summarizing. Seems about right. My notes above are primarily to make sure that the community understands that Google is quite open to adapting any of the current state as a consensus emerges – we view this as an opportunity for evolution vs a fixed point that the community should optimize or tip-toe around.

Thank you for the integration,
I suppose we could consider Torch-XLA as it is currently in the pytorch Org namespace it will be under the future LF umbrella governance with eventually a dependency over assets under the OpenXLA governance.

For the other bridges that will go to target StableHLO (under the OpenXLA governance) they could use it as a routing pivot through MHLO or TCP (wherever it will eventually land or if there will be a fusion/large overlapping of these dialects).