Development of high-level Tensor Compute Primitives dialect(s) and transformations

With regards to the question of a small set of primitive math-ops mentioned in @stellaraccident 's note above: it seems to me that it is useful to decouple “tensor ops” from “scalar ops”. As the title of this thread indicates, what is more interesting (and likely more controversial) are tensor ops, since they provide many different optimization possibilities. There is some hope, however, that the number of core tensor ops/primitives can be kept small, especially if we have a separate scalar-op dialect. The tensor ops will be higher-order ops that take scalar-functions as parameters.

As an example, consider the expansion of SIGMOID in the above message. It seems to me that it would be simpler to express SIGMOID as the invocation of a single tensor-op (say UnaryElementWiseOp) that takes a scalar-function as parameter (expressed as a region in MLIR) that does the scalar-computation of sigmoid (using scalar ops mul/tanh/mul/add instead of tensor versions of the same ops).

E.g., it looks like this is the direction that the StructuredOp in the Linalg dialect is taking (except that it is the lower-level of buffers). Wouldn’t the same kind of approach make sense at the tensor-level too?

Absolutely and this is precisely the direction in which the semantics of linalg.generic and linalg.indexed_generic has evolved in the past few weeks. We realized that having custom ops that can work on both buffers and tensors simplifies a lot of issues and could help with the phase ordering problem of buffer allocation + layout and other transformations.

Thanks, @g.ramalingam. Interesting idea, and raises a few questions:

Is the intention here that frameworks/front-ends can define an op semantic by attaching the scalar op function to it ? And if so, will the ops still have “default” semantics, if the scalar-function is missing?

If the ops always come with a scalar-function that defines what they do, it is not clear to me how tensor-level optimizations/analyses will understand such ops.

Thanks!

Very interested in this topic, which is also related to quantization. A minimal, yet complete set of ops whose semantics and properties are well-known and modelled, would go a long way towards building quantization at “this level”.

I am eager to help the discussions on this forum wrt Quantization.

Absolutely. I wasn’t expecting to get to detailed in this thread, I just wanted to gauge the interest! :slight_smile:

Creating a sub-section was part of the proposal, however Discourse does not support it yet (they said “early this year”): it supports two-levels of nesting right now and MLIR is under LLVM. I’ll look into what we can do for now.

Hi Mehdi,

Thanks a lot for taking the lead on this. I think this is a very important problem in frontend / IR / compiler codesign that should be tackled by the community.

We hope that existing work on the Linalg dialect (see Design Document for the Linalg Dialect ) can help move the discussion forward and help exhibit some of the tradeoffs involved.

Looking forward to live discussions!

MLIR-TCP-WG may be too much of a mouthful ?

This is certainly an awesome effort; thanks Mehdi for seeding the discussion!

I agree with Stella’s point that we should decompose this large problem space into smaller ones to make each one more focused and tractable. They interact with each other though: dynamic shape support will certainly affect how we choose high-level NN/math ops. So it seems to me that apart from defining the principles and dividing the space, we might also want to solve more fundamental and far-reaching aspects like dynamic shapes first before discussing detailed ops; otherwise we might continuously going back to discussing those fundamental points. For op sets, I can see we have multiple levels of abstractions that we can create and some of them already exists in MLIR core (albeit incomplete or not holistically thought-out maybe). It would be nice to repurpose or build upon them.

@nmostafa : good question. Let me clarify what I meant.

I am talking about higher-order ops that always take a scalar-op-function parameter. Without this parameter, they are incomplete.

The idea you mention as point 1 in your other message is about having instructions that contain a call to an op, along with a region that defines its semantics (and can be used by an implementation that does not understand the specific op) is orthogonal.

I believe that the key optimizations and transformations can often be done without a dependence on the “scalar computation” component: for example, all unary-element-wise ops can use the same optimizations, all binary-element-wise ops can use the same optimizations, etc. (Well, I mean most of the optimizations we use in practice; there could be some rare optimization that exploits some property of the scalar-computation).

One of the missing features (footnote: we have placed “ML dialect” as a teaser/lure on our open design meetings a few time in the past ;-)) is a set of common ops that multiple different ML frameworks could target, as a layer designed from first principles and suitable as an optimization IR.

Is this really restricted to a “set of common ops” for ML frameworks or could this be extended to other ops like OpenVX/Opencv?

In an ONNX thread on MLIR google group @stephenneuendorffer shortly mentioned something related to OpenVX/Opencv vs ML landscape.

P.s. Just to give Opencv G-API reference OpenCV: Graph API

Will this cover both DNN and traditional ML? or starting from DNN only please?

Thank you mehdi for the initiative. This already pended in tensorflow compiler for a while.
Few ideas in my mind for sharing, interested to enable it in a iterative way such as enhance_n_refine steps. It will be good have a unbrella initiative with Bunch of smaller components underneath. Interested to start this as a extended in current xla compiler since it’s kind of mature for certain cases.
We are interested to bring dynamic shape & control flow and few non-algebra ops into the tensorflow compiler to handle more deep learning parts, not only don computing part, but also data processing etc.
From the engineer and design point of view, redesigning a complete new ir now may not the best choice. Why not taking certain scenario such as supporting embedding ops or dynamic shape in tensor flow compiler and implement them as a subset with bringing a end-2-end so far best practice in the code base. Meanwhile, many folks have his own problem wot solve, we can start to enhance the ir in a systematic way. And also, we have something already worked as a general compiler, not only for static information.
For now, xla has a good base implement for cpu,gpu & tpu, with potential tf & partial-pytorch in a static way with limited control flow supporing. For now, we can take this as a initial implement and I’d prefer to reuse the xla infrastructure as much as possible. Since milr is good to do interface and bridge too, we may be able to propose a hybrid way. xla will be as it is. milr’s tf compiler flow will be able to start to implement the extra logic such as tf.unique, tf.sliceop and control flow etc with a complete end-2-end compile flow. The milr may not have the same op coverage as a start point view, but focus on the correctness and functionality. For hpcg, we may leave it to the hardware vendor’s api as the begging.

we proposed the dynamic shape in tensorflow milr group. And I’d like to take all our experience to this initiative in a construct way.
This is very high level. Let’s come up a flow chart then discuss each of them in detail and finalize it with a good way as start point.

looking forward to the discussion.

I’d like to leave it open to the people interested in actually building this to define the actual scope.
My take is that we really want to have a compiler IR here, designed with transformations/optimizations in mind. I am not confident enough to answer how orthogonal (or how similar) are the optimizations on the usual set of op in the Tensor domain (like in nGraph/HLO) compare to what you would want to achieve with image processing kind of primitives from OpenVX/OpenCV.

I work on TensorFlow and I am more familiar with DNN, can you help clarifying what it would encompass? Any pointer to existing framework? It may just be better addressed a different IR with different optimization techniques, we should look into it!

Hi ff7250,

Thank for chiming-in!
It seems that you have short-term needs to build an end-to-end flow, but I am not sure what you’re proposing concretely as an action for the MLIR/LLVM codebase? We can’t directly depend on the XLA codebase from MLIR/LLVM.
XLA has many effective optimizations and we’d like to leverage the XLA experience to drive the work in MLIR. I think the nGraph folks have also a significant experience in building this layer, and we’re looking forward to collaborate on this.

There is a large amount of work to migrate TensorFlow from the current XLA paths to include more and more MLIR-based components, see here for instance Redirecting to Google Groups ; the TensorFlow-specific and XLA-specific aspects are not directly relevant to this forum though: MLIR/LLVM is entirely independent.

1 Like

I’d like to leave it open to the people interested in actually building this to define the actual scope.
My take is that we really want to have a compiler IR here, designed with transformations/optimizations in mind. I am not confident enough to answer how orthogonal (or how similar) are the optimizations on the usual set of op in the Tensor domain (like in nGraph/HLO) compare to what you would want to achieve with image processing kind of primitives from OpenVX/OpenCV.

G-API has also PlaidML as a new backed. So I think some CV related operation could pass over MLIR with the PlaidML MLIR refactoring. /cc Is it correct @flaub?

Just to stir the thread up a bit… :cat:
In OpenCV we have 4 ops groups:

https://docs.opencv.org/master/da/dd3/group__gapi__math.html

  • Math operations
  • Pixelwise operations
  • Operations on matrices
  • Image and channel composition functions

Currently in the new PlaidML G-API backend (/cc @flaub) we have just some basic 5 ops covered:

Just to complete the ops overview we have also not core ops:

  • Image filters (sobel, dilate, erode, etc…)
  • Color space conversions (BGR2Gray, YUV2BGR, etc…)

https://docs.opencv.org/master/d2/d00/group__gapi__imgproc.html

@bhack how about control flow ops? are there any? :slight_smile:

Not in the current master version.

The math, bit, and control flow ops may be a start for the common set? I kind of echo @g.ramalingam’s idea of having “scalar” ops (+ control flow ops) and then others may be built against them.