[RFC] Proposal for a high-level ML dialect in MLIR

I don’t have complete standing to actually make the following offer, but I think the feedback would provide some timely facts to the situation:

What if Google were to detach MHLO, CHLO and the lowerings (to Linalg/et-al) from mlir-hlo, clean them up, port existing framework connections to them and place them under unambiguous community governance, licensing and contribution models (i.e. possibly up to the extent that we previously sponsored for the investments to make torch-mlir a community project as an LLVM Incubator repository)? Would that satisfy the technical need? And what elements of community governance are deemed as important for potential collaborators on such a project (i.e. anything from “in the LLVM Foundation” to “aligned with an independent, open-source friendly other governance model”)?

4 Likes

That should include updating the spec before updating the dialect.

Thanks for clarifying that @eric-k

Also, thanks for the explanations of why some of the decisions were made. If TOSA is to be used as a dialect for “all” ML models (which is what we are proposing here), we need to have to way to represent these, irrespective of their implications on implementation / performance.

could you expand more on what you want for reductions with generic accumulation.

I meant a reduce op with a lambda as input (as opposed to fixed reduction operators that are currently present).

This would be a great starting point for us. In fact, we (Cruise) would be happy to help with this as well.

We would prefer it to be under “the LLVM Foundation” since that is something we are used to and does not come with any organizational risks.

2 Likes

This is a nice coincidence! Later today, we were planning to create a GitHub repository for StableHLO - a stable version of MHLO (and CHLO).

At Google, we have a team staffed to contribute to e.g. a spec, a reference implementation and a test suite, as well as new feature development (dynamism, quantization and sparsity are the big ones that come to mind, and I see that you mentioned them as well above).

An open question is the governance model - let’s figure it out together. At the moment, MHLO is pretty much Google-driven, but this is something that we want to change with StableHLO.

3 Likes

@burmako Is that more of a stable input format into the compiler (like TOSA) or is it going to be a compiler IR? E.g. will removing an op be a breaking change?

The current thinking is that StableHLO would be a stable input format, with backward compatibility guarantees, based on something like [RFC] IR Versioning. Removing an op would be a breaking change, and that would need to respect the agreed upon compatibility window.

@burmako Just to clarify, are you suggesting community governance for StableHLO? More specifically, is StableHLO going to be an “LLVM Incubator repository” as @stellaraccident had suggested?

1 Like

We are listening/evaluating and making that decision ~now. For pragmatism, it will start as a google repo (as torch-mlir did) under a Google administered organization. But we would like it to be a community asset and are trying to figure out the governance model/final location/etc. If it were to become an “LLVM Incubator repository”, that would be because a) there is demand for that (and we debate it internally and conclude that is a good direction to go), and b) the LLVM community accepts it. Getting feedback on this thread informs both of those aspects.

5 Likes

Also, thanks for the explanations of why some of the decisions were made. If TOSA is to be used as a dialect for “all” ML models (which is what we are proposing here), we need to have to way to represent these, irrespective of their implications on implementation / performance.

Yes, TOSA takes an opinionated stand on operators, with the assumption that implementation / performance are important characteristics for models.

I meant a reduce op with a lambda as input (as opposed to fixed reduction operators that are currently present).

Hmm. Yes, that would be a tough one to fit under TOSA’s current principles. It would be interesting to see how that fit under MHLO / StableHLO.

Yes TOSA attempts to balance the hardware view of the spec operators with the compiler IR view within reason. But since it defines the functional implementation within the spec there’s an assumption here that the operators translate to hardware implementations from their TOSA forms without a substantial low level gulf in abstraction to hardware / microcoded level.

Viewed from such an abstraction level, an operator like a reduction with a lambda is somewhat higher up in abstraction, but could be potentially reduced to TOSA primitives. This doesn’t mean TOSA cannot accommodate new operators - as @eric-k, who maintains the spec, says - there are defined processes through which contributions are indeed welcome. @stellaraccident was the contributor of the tosa.fft op recently.

What do folks think about developing this incrementally in tree (i.e. not as an incubator project), after first presenting & reviewing the high level design? I feel like at this point the design space is well-characterized and there aren’t major unknowns.

2 Likes

Personally, I’ll defer to community consensus on this, but I’m also skeptical of our ability at this juncture to arrive at that consensus for in-tree development of something of this scale and category. If anything, I have a slight preference for seeing the “ML bits” come out of the main tree and in to a more domain specific repository (or set of repositories) where they can grow/mature and interop with each other more directly (and carry the dependencies common of this domain). I know that we need to improve the infra for managing the detached “ML repos”, but I’m interested in seeing that happen without special privilege being paid to those parts that happened to have existed at the right point in time to have reserved a spot in the monorepo. I’ve argued for more inclusion into the monorepo based on policies before, but I believe the community has been pretty clear on holding a higher standard there. In general, ML compilers are still young, varied and fast moving. I’d like us to have a repository positioning that reflects that vs continuing to add bulk to the monorepo – whose primary purpose continues to be the long term, high stability core APIs and toolchains.

I feel like this should exist at the same “privilege level” as torch-mlir and onnx-mlir.

1 Like

Makes sense, I too don’t want TCP to get a “free pass” because of timing. Maybe let’s discuss this next Thursday as @mehdi_amini suggested. By that time hopefully we’ll have a decision on StableHLO’s location as well.

These are more complex organizationally though since they’re tied to external projects & specs. I’d expect TCP to be fully controlled by the community.

One of the things we’ve found is that beyond the dialects, there are tooling and integrations that are useful in converting in/out, testing, code generating, CI, deployment artifacts, etc. There really isn’t a place in the monorepo for such things to exist with any fidelity. Torch-mlir and onnx-mlir are also fully controlled by the community but they are free to handle these other parts a bit better, and I think that makes them stronger projects that we can put more weight on. Every time the upstream dialects need to grow a new layer of integration/testing/etc, it is a tax that everyone pays – we end up doing the bare minimum because of that, which still adds up but never quite gets us to where we would be quality wise vs if there was a more dedicated project structure for things that are “crossroad” components.

(The answer could be “start a new top level project in the monorepo” but that is an even higher bar – and easier to approach by way of incubator)

My 2 cents.

1 Like

This could be interesting discussion wrt profiles and the like. E.g., TFLite dialect allows for types that TFlite flatbuffer and runtime doesn’t support for . It allows for using the ops with different types but of course that makes a gap with respect to numerics, e.g., one won’t have the same guarantees or conformance, but could use the same computational description. This has been the case in TFL dialect for a couple of years though without much issue. It would fall outside spec and it’s guarantees though.

I think that is key component no matter where this goes with all these potential candidates (or combination of candidates).

This is an interesting one as this is a case where the corresponding HLO op has gotten active negative feedback from stake holders and even JAX doesn’t use this functionality in general. I sometimes feel like folks want multiple dialects and abstractions and just concatted into one dialect for some reason (“we have a single input, yes it has bitshift and nD einsum and inter device communication primitives”) That is to say, if an SCF op fits the goal, why not use it? What are the constraints here? (Speaking from current experience on reduce, the number of actual uses of the lambda are not something that would motivate me to add it, i like it from a generality point of view only/e.g., it’s cute).

HLO scatter is probably the most disliked op in XLA :slight_smile: (well that’s an emotion, but quantitatively the op with the largest number of bugs by some margin). What functionality are you after with repeated indices?

That is very interesting. I know you can have that without external project constraints. But without spec constraints I’m not sure how much you should expect any stability or versioning. I think it’s important to define the goals: TCP was not stable, it was an IR, it had no guarantees except of being useful for more codegen orientated optimization and being target for multiple frameworks. TCF was different story. Perhaps that’s all part of the ODM (i think there was a different one scheduled for next week though, but could be misremembering).

You’re right; @burmako booked the 18 Aug one for this topic.

We don’t have a use case where we write out IR at commit A and read it back at commit B. Is that what you mean by stability / versioning?

Speaking for myself, I don’t think I want to invest in another frontend “reduction” opset which doesn’t have some guarantees around this. The cost for community projects is just too high: outside of corporate codebases, it becomes prohibitively hard for projects to interop at the exact same commit. Even if these are soft requirements, they become constraints for testing infra, frontend integrations, and resulting CIs – resulting in poor quality software (since nothing can ever be tested together). Everything below that, sure, let it drift. But if it is serving a similar integration role as llvm ir for the domain, then it needs to be designed for some level of compatibility.

(True stories from the CI pit :slight_smile: )

(But we may now be talking about two different things)

Sigh… Well, at the risk of sounding like the wacko in the corner, I’ll throw this out there… There’s a somewhat more radical solution to these issues, which would be to just check it all into the same repo. If the Clang/LLVM community is not accepting of end-to-end ML flows, then we simply create a new monorepo where the stuff can live together. I feel sure that this is the place where everyone is going to throw up their hands and explain why that possibly couldn’t work, but if we get to the point where these projects are regularly building together, will it seem so radical?

Steve

3 Likes

Yes that’s been our experience trying to fit those in with TOSA. Ultimately we have the support infrastructure placed sidecar on Developer Resources - ML PLatform and that has been reasonably tractable. For real e2e production level development - a combination of compiler and hardware dev as we’ve done around TOSA, all of these are loadbearing and interdependent early.

Further, to gratuitously summarize a very long history of productive conversation with @stellaraccident in particular, but also with @jpienaar and others, we benefited a lot from taking time to figure out TOSA and quantifatively consider things that fit and didn’t quite fit. It wasn’t so much the input in the form of suggestions for more ops, as much as advice suggesting what had been previously tried and didn’t quite work, that really benefited our decision-making on op constructs here.

For example, we considered and then decided against an explicit broadcast op. Tl;dr - there were multiple potential ways to do it, all with tradeoffs, and fundamentally a poorly designed construct becomes a lifetime of technical debt when backward compatibility requirements are applied in future.

I don’t have a strong signal on the ability to incrementally build out an IR at this abstraction. For TOSA, the broad contours of ‘what we want’ was generally clear early. It took conscious effort to resist the temptation to add more than scoped . It worked out better to stay focused on the op/spec use case while keeping an active line of comms on compiler IR level input that was reasonable - e.g some very productive ongoing conversations with @jpienaar on eliminating shape resolution constraints that were fundamentally unnecessary.

1 Like

I agree with this on both points. On the first point, the list of desired attributes for such a project are not new, and many folks have embarked on this. I don’t think that starting this in tree with a bunch of different folks pushing in pulling in different directions is likely to lead to a different result from previous projects. As with the discussion of TOSA upthread, I think you’ll find that many stakeholders will only care about the “acceleratable by their HW” subset of ML, and that subset is different across various HW architectures. It will be difficult to get consensus here.

I also agree with the second point. MLIR already has a lot of “advanced research quality” code in the main MLIR tree, and it is difficult to know the capabilities and limitations of that code. Much of this is ML focused (e.g. linalg) but other is more generic (affine etc). Splitting these out from MLIR core seems like it would help differentiate the pretty-battle-tested stuff like SCF from the more researchy and evolving pieces.

I think such a move would also help assuage the viewpoint/fear that “MLIR” is moving and breaking all the time. It is true that MLIR core does change, but most of the thrash are in more derived dialects. Splitting these into conceptually different things would help folks better understand how much stability and what sorts of breakages are to be expected over time.

+1. The LLVM Incubator process is specifically designed to support “exciting and in-development” projects that want collaboration across many organizations but want to be within the LLVM umbrella. It seems like a perfect fit for this sort of project.

-Chris