Development of high-level Tensor Compute Primitives dialect(s) and transformations

Yes here I was more in the context of “user-intent” outside the compiler perimeter/point of view and more related about the kind of objects that the literature build up in the model space.
Just to keep a small point of view about this you can give a look at the Onnx new operator/function proposal “protocol”.

But as I understand probably you are not interested to define this level in this thread but probably you could be interested to verify if you can efficiently express these human artifacts :slight_smile: down the stack.

We’ve now carried a variety of this conversation across multiple threads and mailing lists :slight_smile: I’d be up for a f2f meeting at some point to put faces with names and discuss. I’m not sure if that is a kickoff meeting for this topic or not but we could make it be. It just seems like there’s a lot of energy and we’re in long thread territory and talking it through directly may be helpful.

Me and Nicolas are on vacation next week but I know I have availability after that.

For ML, you have scikit-learn, LightGBM, XGBoost and others. The biggest challenge for ML is that most ML algorithm aren’t SGD based, and they use non-Tensor types mostly. They deal a lot with string, table, and support missing data…etc.

To finish on this point, and before a potential F2F meeting.
To paraphrase @bhack, I don’t think there is a fixed roof on this house.
Reducing the abstraction gap between expert knowledge (all the way to physicist or mathematician) and representation in the compiler is what is interesting IMO.

Still, it doesn’t have to all be in the MLIR repository, and certainly not now: I see “keeping things focused around transformations” as the value proposition. In other words, this particular point in space/time is not just about expressiveness but really about “transformations and implications on expressiveness”.

Note that this is not a recent concern: Affine and Linalg for instance both put different stakes in the ground in the expressiveness/transformations/simplification tradeoff. I think the next thing we propose collectively should put a stake in the ground too: there is no shortage of NP-complete and undecidable problems in this space.

Yes you are right. But you can expand operation set.
G-API has got a good api for implementing custom kernels

Thanks for the link, this is very instructive to see how other framework operate!

Note thought that MLIR is a compiler infrastructure, which isn’t the focus of ONNX. To me the important distinction is that if an operation does not map to a compiler concept (we can’t lower it to Linalg, we can’t reason about it to perform graph transformation, etc.) then it is likely out-of-scope. I expect a framework like ONNX (or TensorFlow) to include “ops” that would just map directly to a hand-written kernel, just because it is needed for completeness in terms of serialization/distribution of machine learning model.
From the point of view of the compiler it isn’t much different from std.call to an opaque function.

@nicolasvasilache summarizes it quite well above: “transformations first, everything else second.”

I also agree with Stella that MLIR core is first about providing a toolkit of reusable components that fit nicely together and that can be leveraged to bring great compiler technologies in third-party tools and frameworks.

I hope that with such an approach, and proper use of interfaces, we’ll make MLIR core a good reusable toolkit!

https://mlir.llvm.org/docs/RationaleLinalgDialect/#lessons-from-onnxa-namelessonsonnxa

it is predominantly driven by the expressiveness requirements of ML

I think that the final goal is to cover expressiveness requirements of ML or do you think that we need to have an impact over ML expressiveness so that It could be mapped to compiler concept?

Cause currently I think that ML expressiveness didn’t naturally build up by a compiler point of view (but I could be wrong by an historical point of view).

Also I suppose that some of the ONNX optimizzations
need to be placed in some gray box of the stack and so external to the MLIR perimeter right?

Just for the other subscribers in this topic some “off-topic/noise” posts on this thread was partially discussed or more pertinent to Any front-end framework support such as TF, Caffe?

Actually, I think even a black-box op with a hand-written kernel, there is a lot more to it than just an opaque std.call. As a simple example, say you have a specialized super fast kernel for some op Foo. At a high level, you might have an op %1 = "Foo"(%0) : tensor<?x?xf32> -> tensor<?xf32>. However, you also probably want a corresponding shape transfer function so you can separate and hoist the allocation of memory for the execution of that op. You probably also want to have the information needed to transform it to a buffer-based calling convention that connects with the underlying kernel. You probably also want to know what set of target devices it can potentially run on. For example, your super-duper fast implementation of that op might only run on GPU, so clustering/device-assignment logic should be aware of that constraint.

I think this is kind of what Stella was saying. To a large extent it is possible to ignore the actual math or computation that is performed inside an op and still come up with lots of interesting design problems, which I suspect are a majority of the real design problems that have little prior art (but lots of “prior pain” :slight_smile: ).

Now, to throw away everything I just said…

I think that while so far in this thread we have been talking about “tensor compute primitives”, perhaps we want to refocus the discussion around a “common ML framework frontend dialect” which can serve as a common representation point for TF/TorchScript/NGraph/ONNX/etc. before handing off to later passes/representations. I think that focusing on normalizing across those frontends (with appropriate escape hatches) is itself a hard enough problem and is already a huge ecosystem benefit.

It’s not obvious to me “suitable as an optimization IR” is really a “must-have”. For example, I can imagine a world where a pure LinAlg-style region-based combinator approach is “all you need” from an optimization perspective, and an HLO-style “ops on tensors” is mostly irrelevant. We want to allow all those approaches to bloom, and to do that we need a common frontend that they can assume as a starting point.

To approach that, we follow a procedure like this:

  1. Create and agree on a list of frontends
  2. Create and agree on a list of features/behaviors we would like to normalize across them (the actual math op set is one of them, but also potentially things like broadcasting, control flow, side effects, “modules” with multiple entry points/subgraphs and shared state, differentiation)
  3. Create and agree on a matrix of {frontends} x {features} describing each behavior, so we can see what is the same or different.
  4. For each feature: brainstorm on a common representation suitable for all frontends to lower into.
  5. Implement a common representation that covers all the frontends.
1 Like

It isn’t clear to me that we’re on the same line here. Why would we do this instead of using, let say ONNX directly? What you describe seems to be exactly their goals?

So far, I am aligned with Nicolas on “transformation first” when designing this, and I am not how what you’re describing fits.

Perhaps ONNX has solved the frontend representation issues (I don’t know much about it). In that case we don’t need to look at the requirements of other frontends and can focus only on making a transformation IR for optimizing ONNX programs on the assumption that all other frontends can be legalized into ONNX. However, I suspect that ONNX is not complete in this sense and there is substantial analysis/design/synthesis work for us to do to really understand what are the programs we are trying to represent in the first place.

Maybe I’m just being paranoid here, but too many times have I seen the mistake of not thinking about important aspects of a frontend language causing a really bad user experience down the line. So perhaps my point is something like: being able to articulate which frontends and which parts of their program representation we intend to represent is just as important for scoping the project as being able to articulate which set of program transformations we intend to do.

For example, unranked tensors are really bad for transformations. If we purely “design for transformations” we might decide to not support them. However, if it turns out that we really need to support unranked tensors well (not saying we necessariliy have to), then there’s a new set of transformations like rank multiversioning which we now need to design for.

What transformations we decide we want to do is very intimately linked with which aspects of the frontend language we want to represent.

– Sean Silva

1 Like

I had the same Sean impression.
I also don’t know if ONNX solved the frontend reppresentation issue (at least we don’t know formally by Tensorflow point of view, as it is of the most popular frontend on the scene not “officially” involved in any ONNX WG. All the Onnx work for Tensorflow was historically an IBM initiative/maintainership (also the new ONNX and Kernel dialects for MLIR in the new ONNF MLIR frontend).

If we just give a quick overview the New ops proposal policy we will clearly see that ONNX it is mainly frontend and model oriented (I’ve extracted just three point of the new op policy) :

  • Based on a model.
    This will help us understand the usage and that it solves an actual problem. For the case of the model being private or IP and can’t be shared, the operator doesn’t belong to the standard and should be implemented as custom OP.
  • The operator needs to be implemented by at-least one (well-known) framework. This help us to understand the actual behavior of the operator and its usage.
  • If the operator is available in more than one frameworks, make sure that your design is general and cover those frameworks.

So at least for ONNX OPS it is a Model (produced by scientific literature) and Framework “follower”.

When the literature propose and new model, in the case there is some new that could be not composed by ONNX ops/primitives a new Ops could be proposed but it need at least be implemented in “a well know/popular” framework (this is partially cause ONNX had an inference scope but it is recently entering also in the training stage with new proposals).

So I think that MLIR instead want to be flexible and general enough to cover eventually new “custom” ops at “research” time so where people are still executing experiments and no official model was released/validated and any Framework implemented the requested OPS.

It is about generalization but I think it is also that we need some real experience/sedimentation to be quite sure that we are not surprised by too much compiler impacting “novelties” coming from the Research/Model design (or computer Model design/research AutoML) activities.

I don’t know if receiving directly dowstream request from ONNX, Ngraph, Tensorflow… dialects teams independently it is the best methodology or we are missing some opportunity here.

^- this!

I think there are a few different things to consider here:

  1. What are the core abstractions and building blocks that are used and can support “all the things”.
  2. What foreign standards do we need to interoperate with?
  3. What goes into the MLIR core project, other parts of LLVM, or stays in separate OSS or proprietary codebases.
  4. How do all of these things change over time?

To me, #4 is the currently most important thing. MLIR is still pretty young and we’re figuring things out, both in terms of technology and processes. It makes sense to start out a bit conservative, build and iterate, and nail things down when they are well understood. We need to do “research” to explore and do the learning, but a rush to standardize complex things seems unwise at this point.

IMO, there are a few things that should be prioritized in the short term:

  1. Getting the domain independent parts of MLIR really solid is important. This has made really great progress, but this should continue to be reinvested in as problems are found and new things are learned.

  2. We need to figure out the project’s policy for taking dialects in tree that correspond to foreign standards. ONNX is one example of that - it is governed by a mature standard’s body, is properly versioned, and the world would benefit from having “one true MLIR representation of it”. Standardizing the dialect itself seems to have a lot of value, and I would like to see this happen as part of LLVM somewhere for the good of the ecosystem. That does not mean that code working on ONNX needs to be pulled in though. The same thing applies to other foreign standards of similar maturity (e.g. LLVM dialect).

  3. There is a strong need to provide a standardized framework for tensor code generation, where the framework can and should be op-independent, and ideally codegen-algorithm independent. This was the thing I was pushing for in the talk I gave a few weeks ago at the MLIR ODM. Such work will open new collaborations and allow pulling technology together into a single platform that allows better A/B comparison. This would also allow people who care about specific op sets to implement codegen for them (out of tree).

In contrast, I think we should actively avoid talk “the one true MLIR dialect” for tensor ops. Such a thing can happen elsewhere, and if/when the discussions converge, then we could consider taking it in. It seems pretty clear to me that no existing op set meets all the needs here, so a lot of design work needs to be done. This design work would benefit from doing some of the things above.

-Chris

I disagree with the “expressiveness first” flavor of this: MLIR has the facilities for this already: just use CFG and loops + richer data types.

The whole point of the exercise is to avoid lowering too quickly and to build high level representations that keep the information and are useful for transformations.

Despite repeating myself, I think this effort should be about “transformations first, everything else second” :slight_smile:, we already have / will have all the expressiveness at the lowest levels of IR.

Thanks, Chris - I find the way you formulate the parts of this discussion to be clarifying. (There are also a lot of good points up thread)

Personally, I would like to see us figure out a path forward for the ONNX dialect and supporting conversions (ie. “The one true place to define ONNX in terms of MLIR”). I think this can be neatly disconnected from the design/technical discussion about what goes in “core” (and when). Aside from the intrinsic benefits of having an additional, high quality front door, those of us who have primarily experienced MLIR frontends in terms of TensorFlow lowerings would do really well to have another reference point in the space (especially one that is less ad-hoc, versioned, multi-party, etc). While there are governance issues into where such a thing ultimately lands, could this be as simple as creating an appropriate GitHub project and having some volunteer authors/reviewers to help get it established? That’s how MLIR itself was bootstrapped…

I still want to be very cautious about what we migrate “core-ward” for many reasons that have already been hashed out.

(I care about this topic but will be offline for the next week – just so my silence is not misinterpreted)

I think this is a good point cause Tensorflow (XLA/HLO) was a little bit “implicit” in the MLIR Google origin.

It’s a bit more than that even: while there was a lot of principled work that went in to HLO, there was also a lot of last mile, pragmatic “do what needs to be done to lower TensorFlow programs” compromises baked into it. We’d like to learn from the former and firewall the latter. I think you’ll find that many of us on the Google-MLIR side are highly critical of the TensorFlow opset “design” (and have a somewhat higher opinion of HLO as a high level representation but don’t want to see it enshrined en-masse either) and want to see a conservative approach to efforts that may get us back into the same positions if they are rushed.

Through this and other discussions, my opinion has shifted from “sure ONNX-MLIR should exist” to “giving ONNX-MLIR some oxygen and developing it in line with eventual adoption by LLVM would provide a really good design counterbalance to what we have now.”

Separately, I would still like to debate a small set of intermediate level tensor/SSA-value domain ops for common representations of:

  • Explicit broadcast ops
  • Indexing, slicing and combining ops
  • Shape manipulation ops

I work heavily in the realm of tensor-domain optimizations/transforms and find it impossible to get beyond trivial things without needing to reference one of those. Currently because HLO is the only mid level opset with such things, we end up introducing needless dependencies on it, and that is unfortunate.

There are parallel constructs to each of the above in the “codegen” domain but without the intermediate level bridges, we end up coupling a lot together that does not need to be. There are a lot of transformations that are best done in the tensor domain, and I’d ultimately like the richness of tooling there that is becoming well established in the codegen/memref side.

1 Like

Anything else in this thread?

I don’t quite get what you mean here?

I think @stellaraccident already expanded on this in her last reply.