(This was why I tongue and cheek chose the “1000” upper limit. I agree. Also why I was drawing in requirements for delegation models to existing kernel libraries UT. I think that roughly orthogonalized dialects can be a great starting point since they cover a lot of area, but they are poor for fitting the corners)
Thanks to everyone for contributing to the discussion here. Given the interest shown by several people, we will be discussing this further in the ODM this week on 8/18.
Here is the document proposing the design principles for this new dialect: [Public] Tensor Compute Primitives (TCP) Dialect - Google Docs.
Please feel free to add your feedback in the document.
(Adding the contents of the doc inline here for a quick read)
This document proposes the design principles for a new high-level dialect in MLIR.
We propose to use the name Tensor Compute Primitives (TCP) for this dialect, which is the name that was proposed for such a high-level dialect in this discussion a while back.
Design Principles
Meta-level Principles
- TCP will be a compiler IR and not a stable input format. It will not be backward compatible.
- This would help iterate and progress much faster.
- We have TOSA and the proposed StableHLO to cover the space of stable input format dialects. So, this provides a nice alternative to those dialects.
- We could follow some deprecation policy to ease changes that are backward incompatible.
- TCP will be completely owned by the MLIR community and any changes to it would only need the community approval.
- TCP will begin as an LLVM incubator project with the eventual goal of merging into in-tree MLIR.
Feature-level Principles
- TCP will be able to handle both inference and training.
- TCP will be agnostic to frontends and backends.
- The only implication of this is that TCP should mostly not include very obscure features or ops that are specific to some frontend or backend. But if the MLIR community feels that such a feature / op is worth having in this dialect, then they can still be supported.
- TCP will be amenable to codegen for GPU, CPU, and accelerators.
- TCP will be a dialect that all frontend dialects could translate to.
- TCP would strive to include a canonical set of ops that covers a vast majority of the various frontend ops. But there is no guarantee that every op in every frontend dialect could be lowered to this dialect.
- TCP will include a canonical set of ops at a higher-level of abstraction than Linalg.
- TCP will have first-class support for:
- Shape dynamism (but not rank dynamism).
- It has been noted in the discourse thread that rank dynamism leads to several complications, and is not necessary for the majority of models.
- Quantization
- Sparsity
- Shape dynamism (but not rank dynamism).
- TCP will have explicit broadcasting semantics for all ops.
- This would simplify some of the transformations and make it easier for codegen.
- This would provide a clean way to handle shape mismatches and broadcasting for dynamic size of 1.
- One drawback with this is that it would complicate op-level fusion because fusion has to ensure that the broadcast ops are fused appropriately to avoid getting materialized and end up with sub-optimal performance. However, it will simplify algebraic simplifications. So, this seems like a worthy tradeoff.
Implementation Choices
The following implementation choices are what we felt would work best for TCP. This takes into account all the suggestions made in the discourse thread. We are pretty flexible on these choices as long as they conform to the goals and requirements of TCP.
- Signedness information will be part of types.
- The alternative is to have the signedness information as part of the ops, as in Linalg, LLVM, etc. Since we are targeting a higher-level of abstraction in this dialect, it seems more prudent to have signedness as part of the types.
- It is clear that we need to go from signedness on the types to signedness on the ops at some point during compilation before codegen, but this can happen later in the flow.
- Combinator approach to ops
- There will be one unary, binary, and ternary op, each with an appropriate set of payloads to represent the various individual ops at the frontend level.
- This simplifies code a lot, avoids duplication, and also enables efficient implementation of transformations (as pointed out in the discussion here).
- The alternative of having a different op in the dialect for every unary op, for example, does not seem to be necessary.
- Destination passing style
- As pointed out by several people in the discourse thread, DPS has several advantages like ops accepting both tensors and buffers, working well with bufferization in MLIR, etc.
- However, we may not want to complicate the op semantics solely to simplify one-short buffer assignment. So we prefer doing these only for ops that have a natural destination op, like scatter.
- Define new types for this dialect.
- This provides a clean way to restrict the kind of types that we want to support.
- Reuse ops from existing dialects wherever possible.
- We should be able to reuse ops from existing dialects as long as they are based on builtin types. For example, we should be able to use
scf
since it uses builtin types. But we will not be able to use ops from dialects that use their own types.
- We should be able to reuse ops from existing dialects as long as they are based on builtin types. For example, we should be able to use
- Allow scalars
- Dynamic shape calculations would involve scalars and it is best to support them as first-class types and not promote everything to tensors.
- Structured design
- Have a transformation-driven IR with interfaces and have ops as implementations of these interfaces (“DestinationPassingStyleOpInterface + TilingInterface + StructuredOpInterface”). See here for details.
- Is it possible to use this approach for some subset of ops only?
Target Transformations
One of the major reasons for proposing this new dialect is to be able to do some transformations which are not possible (or very complicated) with the existing dialects. In that regard, here are the transformations this dialect should support.
- Op-level fusion
- Algebraic simplification
- Graph partitioning
- Layout assignment
- Buffer Assignment
End to End Compilation
TCP should be designed such that it enables a full e2e compilation flow that targets various backends using MLIR. The complete details of this e2e compilation flow is out of scope of this document.
One such e2e compilation flow could include: Torch => TCP => Linalg => …
Ops
The exact list of ops is something we could iterate on. Here are the classes of ops we foresee in this dialect:
- Unary, binary, and ternary element wise ops
- Reduction ops
- Data movement ops
- Scalar ops
- Control flow ops
- Fusion / Region op
- Misc ops
Very excited with this proposal! Thank you for taking the time to gather all this information and drive this change. Looking forward to this week’s meeting.
Ditto! Very exciting!
++!
Looking forward to it
This is super exciting. One design question I’d like to surface here in preparation for the ODM is the following: Does TCP 1) actually save Torch-MLIR work by avoiding Torch-MLIR needing to maintain as many lowerings, or 2) is TCP “just another backend” for Torch-MLIR to support. And you can s/Torch-MLIR/any frontend/ there.
I think we can investigate this concretely by considering the layering of
- Torch-MLIR
- TOSA
- StableHLO
- MHLO
- Linalg
- TCP
One possibility is to have Torch-MLIR → TCP → {all the rest}. This is a huge win for us in Torch-MLIR and we would adopt TCP as fast as possible.
Another possibility is for TCP to always be “hidden” under StableHLO/TOSA, which provide stable interfaces for frontends. As I described upthread, Torch-MLIR would like to target a stable “union dialect” which avoids needing to prematurely expose target details.
I’m not taking a position on what is “right”, but I really think that there are two different design points based on whether the goal is to be “one good thing for frontends to lower to (and then fans out to everything else)” or “a transformation dialect sitting lower in the stack hidden from frontends”. We should be really careful if we try to solve both those problems simultaneously to make sure we don’t make contradictory design decisions.
It’s totally fine for there to be “two different things” that need to be built here. We just need to be clear which one we are building and adjust expectations appropriately.
Layering is a very important part of the design. Thanks for bringing this up @_sean_silva.
Here is our proposed layering with TCP.
MLIR e2e compilation flow.pdf (41.9 KB)
(I will present this in the slide deck tomorrow as well)
The summary is that the frontends support two lowerings (at least in the near term), one to StableHLO for the stability guarantees, and the other to TCP for a path to codegen in MLIR. The choice between the two can be based on the needs of the use-case.
Also, as discussed in the proposal document, TCP will eventually be “stable” (similar to LLVMIR).
Just to note for the Google frontends (tf and Jax), our plan of record is to lower them both to StableHLO, and we have requirements that likely make that the preferred path for the future. Totally open to take a requirements oriented look at this decision as things progress, but as production products, we’ll likely bias towards a conservative stance on this.
But we’re also very excited about TCP, and are looking to see when/how it makes sense to lower StableHLO to that.
(This seems aligned - I just wanted to be explicit)
From a selfish and purely transformational point of view, to me, it doesn’t matter much what is the path of dialects, as long as there’s a point in time where most of it is in TCP+linalg+affine/scf.
It’s equally fine if front-ends want to lower directly to TCP, or to their dialects and then TCP.
It’s also fine if they leave left-overs with TCP, as long as they clean that up before full lowering.
What concerns me is how we manage the pipeline to full lowering / codegen.
If the front-end wants/needs to keep left-overs or convert TCP into something else later, then my transformation passes need to be a plugin to an existing pipeline. This is more brittle, as I have to assume my pass results can be further transformed and broken, but again, this is a contract between my passes and each framework, and it’s my problem, not TCP’s.
If the front-end emits a clean TCP+friends IR, then I can take over and complete the process on my own, which is much easier for me (and others like me).
In conclusion, the latter would be much easier for projects like mine (and would improve MLIR’s usage on new back-ends and optimizers), but the former is tractable and should not stop front-ends doing what they need to do.
Since StableHLO is jax’s primary IR, that is more likely to be a comprehensive conversion (since StableHLO seems pretty aligned to lower to TCP, as it is getting described). Also, TF has a lot more surface area but Google has strong motivations to reduce most of it to StableHLO – at least for the core compute.
Even in Jax, there are custom ops that get used in practice (ie. To target libraries, etc) so there is always going to be some leftover where we either can’t lower to TCP or in our subjective judgment, such a lowering may be possible but inadvisable. For these things, we need to preserve source dialect ops in some way. While MLIR does support arbitrary mixing, I’ve generally not found that an “anything goes” there approach to that actually produces good results (moreso if there are type changes). In my mind, we end up with some way, possibly frontend specific, for encasing non conforming ops from the layers above when converting to TCP vs just letting them mix at random. But that would need to be designed and iterated on with concrete examples.
Thanks to everyone who participated in the discussion today in the open MLIR meeting. Here is a summary: (please let me know if I missed anything)
- Layering TCP and its connection to other dialects
- TCP should be a transformation dialect (not a frontend-ingress one)
- TCP could start with an intersection of ops in MHLO and Torch, and expand from there.
- Torch will have a lowering to TCP, assuming Torch-TOSA and Torch-Linalg lowerings can be achieved transitively through TCP (and hence these direct lowerings can be deleted in TorchMLIR).
- Transformations in TCP
- TCP will have transformations like op-level fusion, algebraic simplification, layout assignment and buffer assignment.
- Is it necessary to do Tiling in TCP so that we can use it to map to libraries that require fixed size tiled inputs? This warrants an extended discussion.
- Who is ready to contribute?
- Mehdi is ready for code reviews
- Closely collaborate with TorchMLIR and StableHLO.
- Use “TCP-WG” topic on discourse for discussions.
- Incubator vs in-tree:
- A separate incubator repo
- This would be useful to do big changes without affecting anyone else.
- But adds an extra repo to depend on.
- Hosting TCP inside TorchMLIR repo, with a clear exit strategy
- This would make it complicated for StableHLO to work with TCP.
- In MLIR tree
- This is preferable due to the benefits of everyone depending on MLIR being able to use TCP.
- We should figure out a way to reduce the impact on others working on MLIR (while TCP is being bootstrapped).
- Summary: In MLIR tree seems preferable
- A separate incubator repo
As a next step, unless there are objections, we can propose an initial sketch of the op set for TCP and use that to drive discussions with concrete examples. Let us know if anyone is interested in working with us on this initial spec.
Thank you for the notes!
While I recognize that it is “easier” for this to be developed in tree, the same is true for all new efforts. It seems important to follow the documented llvm policies:
https://llvm.org/docs/DeveloperPolicy.html#introducing-new-components-into-llvm
-Chris
I read through the RFC above and attended the meeting yesterday as well. One thing that wasn’t clear to me was whether the ops in this dialect are meant to be exhaustive to aid transformation after lowering out the various programming model level dialects or are just meant to cover what isn’t covered and addressed by TOSA, Linalg, and other existing dialects. I think I heard some statements supporting the latter – that it’s meant to augment – but it’s hard to gauge without a starting list of ops. Is this going to largely be a “named op” dialect? i.e., operation names spell out a lot of the semantics without a broad classification based on other features such as element-wise/point-wise, matmul-like, stencil-like, convolution-like, and misc. affine nests (which we’ve used with good results downstream). However, such a classification isn’t enough since it often precludes certain novel “near-algorithmic” expansions and lowerings, and one does need named ops (e.g., the way mhlo and lmhlo have it) to have the best possible way to lower things in general. As an example, we’ve had custom lmhlo → affine lowerings for certain kinds of convolutions that can’t practically be realized by any of the existing lowering paths in MLIR in-tree (or with other repos); such custom lowerings (which are still a gradual application of transform passes) provided 2x-5x flat improvement over TF/default and TF/XLA (with the latter using CuBLAS/CuDNN/CutLASS kernels) on NVIDIA Ampere GPUs.
The other point that was brought up was the significant amount of new code that’d be added to the new lowering paths. Wouldn’t there be a great deal of duplication across TOSA to lower-level MLIR dialects, MHLO to lower-level MLIR dialects, and this TCP to the same target? This again ties into what ops you’d want to add, what the representation will be, and whether you are augmenting other dialects (and perhaps pulling in some of the existing ops from there) or just providing the entire spread.
It is the former, except given the breadth & variety of ML frameworks we would never have 100% coverage of all ML ops. So we’ll have to design TCP with interop with other dialects (e.g. Torch dialect) in mind so that these rare ops can stay as Torch ops.
As @raghavanr mentioned above, over the next 1-2 weeks we’ll put together an initial sketch of the operation set. Hopefully that will ground further discussions in specifics.
There’s probably going to be some duplication between TOSA->Linalg and TCP->Linalg in the short term, but longer term we should figure out some way to unifying these. I’m not sure if the duplication with MHLO is relevant for TCP since that’s not part LLVM/MLIR.
I don’t think the proposal to develop in tree is just because it’s easier, nor I think that people proposing it are ignoring the policy. The boundary isn’t that clear in this case.
First, the policy doesn’t really cover this kind of change
This is not about adding any existing project (Torch-MLIR, IREE …) into the LLVM codebase nor it’s about creating a whole new project into the LLVM umbrella. It is just about creating a new dialect that already has multiple heavy users in and out of tree working on it and actively planning to use it directly as a core dialect.
We can’t request all new passes, transformations, dialect changes to be incubated before joining the monorepo. It does not scale for this kind of change. Because of that, the policy focus on new projects (including new targets) and not changes to existing projects.
Neither this is a dialect proposal from an external project or new group into LLVM, wanting to convince the core developers that it’s worth it. This is a very large number of cross-industry core developers designing a new dialect to make the lives of all MLIR users easier.
Second, it’s not clear to me what is the cost
Initially, the cost of having this dialect in-tree is mainly compilation time of existing builds and, for those that blindly import all dialects, a little bloat on libraries and a probably indiscernible increase in load times.
The commits done to and around this dialect won’t (initially) touch other dialects. By the time we may want to change dialects like linalg
and affine
to suit TCP
, it would already have been time to be in the monorepo anyway.
People that are not using the dialect will not see any change in their code (unless they import all dialects) and various users of LLVM won’t notice the difference.
Finally, the benefits are clear
In the call there was a proposal to have it in torch-mlir
and then people were wondering how do they get as a dependency from other projects. Others reminded that, if this is in a separate repo, there’s a completely separate review process. There’s also the problem of proposing changes in the core MLIR while TCP
isn’t in it, which is also harder to keep up.
But more importantly, developing a new dialect in-tree, where so many core people in the community are already working on it (design phase now) means we can collaborate in the place we’ve been always collaborating.
To emphasise again: this isn’t a dialect that a side project is proposing (like torch
, mhlo
, cil
, cir
, etc), it is a cross-project cross-industry proposal to significantly improve MLIR’s ability to unite all these efforts together. That level of collaboration would be severely hampered by happening elsewhere.
Yeah absolutely: we followed the same process as with every new addition to MLIR: an RFC, a discussion, and we ensure alignment with the rest of the project (for a dialect: we mostly have to define how the dialect will fit and interact with the rest of the ecosystem).
A recent dialect added recently this way was the “transformation” dialect for example.
IMO “projects” are incubated, not “components”, and TCP is meant as a reusable MLIR component.
Another non negligible aspect of incubator to me is that they somehow create more of a “community split”: torch-mlir operates really as a different community (overlapping of course), and that is fine: it is a different project!
I believe there is a relevant set of clear guidelines for the specific purpose of proposing a new MLIR dialect in-tree.
Pointing specific concerns that are not yet addressed by the RFC / ODM / technical discussions would be helpful!
Self-edit: I think a section such as this one is missing but I am confident it can be achieved.
To me, without intimate knowledge of the dialects, I don’t think it is right for both TCP and TOSA to live in MLIR core at the same time given the overlap. I think that the community should agree to migrating the TOSA implementation elsewhere and take over that role with TCP if TCP goes in tree.
Note that MLIR tries to be pretty abstract and general while TCP will be heavily impacted by the frontend frameworks it wants to support which definitely do not meet that standard. I think the eventual TCP specification is very important for deciding if this dialect meets MLIR core’s goals.
People that want to use the dialect later will almost certainly be pressured into following certain design decisions that were directed by only PyTorch and HLO which both made many design decisions for convenience of users and for the benefit of specific hardware backends. It is not easy to change this direction in the future, and MLIR wouldn’t allow yet another TCP with different views to be added at a later time. This being another case that the eventual specification of TCP should be very important for the decision of in-core vs incubator or elsewhere.
This is very likely to be an almost permanent decision and is being placed for the convenience of the current core developers. It might be practically worth it, and it might end up being a great dialect, but it should also be clear that this gives the designers of TCP a large advantage over any ideas that come later whether the new ideas are good or not.
TOSA is a spec, it is stable, it isn’t designed for compiler IR. My take is that StableHLO and TOSA should solve the “input” (we call them “edge dialect” in the doc) while TCP (like MHLO in XLA) are compiler internals.
I don’t think it is fair to TOSA to have to kick it it out so that TCP can exist, such condition also makes me uneasy because it puts a sudden strong incentive to the people who have a stake in TOSA to object to TCP just because of this: this will make it impossible to have a reasonable discussion on TCP itself.
First, we have to begin somewhere, this line of reasoning makes it impossible to do anything if pushed to the extreme. Then, I think this is a bit out-of-proprtion, does linalg have a unsurmontable advantage over TCP because it was there first? What kind of new ideas wouldn’t we incorporate in the ecosystem (TCP, Linalg, or new components) in the future if we’re convinced they are worth trying out?
Finally, even if it is “the convenience” of the current developers (I disagree with your choice of word), so what? This is how projects are managed by design: the people involved and developing it are pushing directions collectively (with guidelines, as Nicolas reminded us). Nothing is “permanent” as you seem to think: there is a reason we don’t have a stable API and that is exactly this! Doing TCP does not commit us to anything and we can delete it entirely from the tree in 1, 2, 3 years if the upstream community loses interest in driving it further.
I’m still reserving a bit of judgement on where the earliest work on this lands (ie. The next N weeks / first sketches), but one point that was discussed in person that I thought was relevant: by providing this missing piece upstream, it relieves some pressure on current dialects and layering to cover more area than they possibly should – and that should be a positive design force. We should definitely have this upstream with its peers when it is far enough along that it is prompting those kind of changes. I suspect that will be relatively early once this effort gets some traction.
Another point that was not mentioned: this is on a path to be the heaviest user of dialect conversion upstream, and I think that would be a very positive force. I’m aware of some of the issues and edges that get encountered in downstream vacuums in this area, and while people try to do the right thing, report issues and fix root causes, there is always a hurdle in downstreams that biases towards just “I’ll make this one local tweak and fix it for real upstream later.” I think the core infra will be ultimately improved by having more serious upstream use.
(Side note for another time: I still wish we had a better project organization or taxonomy which allowed sub groupings of components/features to have an identity that let then reason about themselves more at the level of a distinct thing vs just being drops in the ocean)