As discussed in the ODM from 2/10 (and corresponding discussion thread), I am proposing the introduction of an ml_program dialect in order to begin the incremental work to upstream common structural components needed to represent compiled ML programs from popular Machine Learning frameworks.
I have created a draft here (containing just a module op, which should not be too controversial, and in itself highlights several areas where upstream infra may not be generalized enough): ⚙ D120203 [mlir] Introduce ml_program dialect.
As discussed, we have working, practical importers and compiler paths for all popular frontends and runtimes, and not consolidating some of the common structural components upstream is just adding interop friction (and forcing odd layering) that we should aim to eliminate. The intent is to proceed incrementally towards an end state that:
Provides structural components for representable program forms (i.e. modules, function types, initializers).
Defines representations for memory models of runtime targets of merit.
Defines utility representations and common, low-level data structures (i.e. lists, etc).
Defines interfaces for type system extensibility (i.e. allow frontends/runtimes to specialize any of the above without forking the entire representation).
Provides infrastructure for versioning and facilitating the creation of standalone serialization formats.
It is expected that this effort and the resulting representations will complement existing op-centric dialects (i.e. TOSA, MHLO, ONNX, ATen, etc) by providing a common backplane and facilitating tools interop by providing common structures.
Since this space has a lot of variation in it, we will proceed with the design incrementally, adding new structure and adapting existing interfaces to use/validate it as we go. This will be an inclusive process, focused on arriving at appropriate mid-level, compiler friendly representations for the above. It may go beyond this initial ml_program dialect when orthogonal concepts are identified.
Solve the “mkdir problem” with respect to a new area being developed upstream.
Assemble a working group of stakeholders to design each of the above facets, where each functional addition will either be by non-controversial consensus or a follow-up RFC.
Inform other engineering efforts regarding some of the generality of MLIR that we will be putting to the test upstream (new top-level modules, improved pass discrimination, full function-like generality, etc).
I’m glad to see this already! My impression from the presentation is that there there are a few aspects of this that fit together around more pointer-ish semantics. There’s something of a danger in trying to primse too much by scoping this widely, but seems like some of the parts (like ‘lists’ and ‘bag of bytes with views’) are pretty generic: beyond just ML… Perhaps these could live in their own dialects? Of course they could always be moved later, too…
They definitely can. I think we should be aiming for full representability upstream. But honestly, most of what I showed in that presentation, in terms of an existing full solution, could take us ~a year to work out and land in a coherent way. Figuring out what is orthogonal and organizing it is part of the work. Given all of the implementations out there, I think the only way forward is incremental. Having some existing e2e’s in view will help us.
I think my main point here is that we start and we start with some structural components that we know are missing and hiding work.
Thanks Stella! A few initial questions (haven’t fully digested this):
How is a “runtime target of merit” being defined here?
Can you elaborate what is meant by this? Maybe a few examples could help me understand how this would work.
I’m well aware that the community (with any) has a difficulty starting something and the pain that brings, but I am a bit wary of starting a dialect with a wide reaching name that doesn’t have any concrete bits figured out yet. What exactly is the trade off here of defining a working group first and having a more concrete initial path?
Consider the patch more something to show intent and try to inspire more discussion (I figured that there was just above a 0% chance that we wouldn’t bike shed the name, much less approach ). I am fine taking the time to get more of a charter defined, so long as when we identify a chunk, we build the chunk (And I think some of the first chunks are very structural). I’m being a bit assertive to get something going here because I’m watching complexity and workarounds emerging downstream because there is no upstream momentum developed, and I want to make sure we start designing for it.
Let’s see where the discussion goes. I think that will yield some paths forward.
Well, I was going to see what/who emerged among stakeholders. But I was generally considering the union of current generation of compiler targeting runtimes that I know about. From a high level program representation standpoint, they aren’t that different when you get down to it.
I’m really open to any approach that gets us there. The issue as I see it is that we have all of this working, but it is spread across multiple projects (including in LLVM). This is resulting in poor interop and some signs that I am seeing that projects are reaching for paths forward that are not a global optima (I count IREE’s input dialect and converters in this category – but they do contain a lot of practical learnings and were separated out as a half step to a better state).
One other tack we could take on this: start implementing upstream what torch-mlir needs to maintain backend optionality. This would not be comprehensive, but I expect would provide us the focus that we could generalize to the other projects as we go. That would have the advantage of having a tight goal, which is always good for driving something like this. It also has the benefit of not being experimental: it is about getting some of the layering right in a way that needs to involve upstream.
I like the idea of having an upstream target for this. One possibility might be to push torch-mlir upstream to exist as the target for the new abstractions. That might elicit more discussion however and I agree that getting something going quickly is important to avoid downstream duplication. Prototyping it in torch-mlir might be another option?
Thanks for the RFC!
I’m supportive in general of finding a place to start collaborating upstream on some of these important aspects.
Some questions about the RFC:
Scope of the dialect: in general we need to be cautious about “kitchen sink” dialect. Dialects (at least upstream) should be “small” and well scoped in my opinion (experience of std is still recent). Not scoping a dialect well enough is a magnet to easy addition of anything.
The naming: I’m not sure about having “ML” in the name. As long as you don’t have first class support for things like gradients and optimizers, what is specific to ML? Is there a more descriptive name for what the dialect entails?
Reading this paragraph emphasizes my previous point even more: we would need to be able to move “fast and incrementally” in this space, but to be able to do this we need some more clear scope.
When you write It may go beyond this initial ml_program dialect when orthogonal concepts are identified I am slightly concerned about some lack of orthogonality during the development. I’m not in favor of just creating a “namespace” widely scope so that orthogonal concept can be introduced quickly in the same place, bypassing RFCs and such.
(this goes with what Stephen was mentioning about “Perhaps these could live in their own dialects?”)
Yeah, I’m generally not a fan of sticking the “ml” prefix on things and I proposed this after convincing myself that we are actually in domain specific territory here: the generalization we are reaching for is something like “input representation for programs extracted from an ml framework and targeting a lower level ml runtime.” This is somewhat different from “representation of a user facing ml framework’s language” (which does sometimes have those concepts you point out - but not always).
As a thought experiment, we may come up with a better name by trying to identify the layer that torch-mlir lowers in to and that runtime components accept. Like I mentioned: that is not comprehensive to all things we should solve for, but I think that it will give us specificity in terms of non experimental, in-ecosystem components that will generalize to everything else. There are parts of that boundary that are “just ops” whereas there are also parts that impose the needed structure. I’m focused on the latter here.
Right but I’m not sure there is much of " ML" left post “program extraction”. Just today on Twitter someone was asking me about XLA and they aren’t using it in the context of any ML framework for example.
What about tensor_programs? It seems that the main abstraction resolves around compiler/runtime interfacing for “tensor” based programs more than anything else. For me, the focus on “tensor” may be for me what differentiate this from a more general interface that would support arbitrary objects (graphs, trees, pointers-based structures, etc.).
(I’m using quotes around “tensor” to mark that I’m not necessarily referring to the immutable value modeled by the builtin tensor type here, but more abstractly the programing model commonly seen in DeepLearning framework like JAX or PyTorch)
I’m +1 on that, with this last caveat. Given how overloaded “tensor” is (not just in MLIR but in the… Ahem… Choices of some marketing names recently), this is going to put pressure on defining precisely what this is and isn’t. I’m usually against calling more things “tensor” for similar reasons as “ml” (preferring array or just about anything else), but in this case, it might be the least confusing moniker.
Nice to see this get started so soon, building on an existing e2e construct like torch-mlir sounds like the most sensible path indeed.
Since I brought up memory models I’d like to offer a little more context - the original meeting covered a set of global ops around load/store semantics for statefulness in tensors. This works in conjunction with an abstract memory model implemented by the runtime. What is the minimum viable construct of a memory model that a runtime must support ? E.g. does the runtime need to implement a heap ? Can it just interpret those tensors as input+output ?
Similarly dynamic shape capability ranges from none (fully compile-time shape resolved) to rich dynamic shape support in multiple dims at runtime.
The idea here isn’t to have solutions for all these now as these are significant subproblems in their own right, but to hopefully enable the expression of a collaborating ml_runtime like construct that makes the relationship between compile-time and run-time considerations clearer.
+1. Perhaps I’m missing some prior conversation that yielded some of the design ideas here so perhaps a more explicit statement of what a particular piece intends to do, what exists right now and why it’s not good enough ? E.g. are list ops solving a specific abstraction problem or are they more a ‘nice to have’ thing ?
Hopefully this will yield a collection of work that from my point of view could be broken into:
Foundational parts : ml_program / tensor_program and its _runtime part.
Imperative constructs, e.g. lists
Execution constructs: concurrency, some dialect to express compute and memory hardware abstractions expressing the underlying target, some interface to tie in existing scheduling strategies like polyhedral or BYO ones.
But I’m getting ahead of myself here… I mentioned them because I’ve touched these topics internally and it would be nice for MLIR not to have everyone solve these independently.
I find these descriptions a bit vague (all except the third and the fifth one). Do you have examples or an initial list of ops, types, and attributes that you are sure this dialect will have? The ModuleOp in the form added in the revision would appear like a clone of or a subset of the builtin module that exists. Do you have a rationale for a separate module op here?
I did attend the ODM but, unfortunately, the purpose and the positioning of the dialect still appeared vague to me from the slides presented – there weren’t examples and there wasn’t enough time for questions or discussion at the end of the meeting.
Thanks for the feedback. The presentation was quite a bit choppier than I had planned – both the timezone and span of material was hard to lock on to well for me. I have gotten bimodal feedback on this point: many of the folks who have worked a lot at the frontend/runtime boundary (from multiple projects/companies) did connect the dots in terms of needs here while many of the folks who spend most of their time at the lower levels were kind of lost. This has been one of the problems I’ve had with MLIR kind of being set up to be a “big tent” but not yet having developed the internal structure to even let us localize well what we are discussing.
The other thing I’ve been trying to balance, which probably works against clarity, is to check my own biases. On the IREE side, we do have all of this implemented and working for every modern frontend for targeting the form factors we care about. However, my goal when approaching this upstream is to create the abstractions that will allow optionality at the frontend/backend intersection, not just try to convince people to specialize on what we have built. It would be the easiest thing in the world for us to say “let’s upstream the iree_input dialect and move all of our frontend conversion paths to that”, but I don’t think that would lead to the best outcomes. Better to take what we have built and use it as a prior to build the abstractions and layers a bit more purposefully with more stakeholders. So I was trying to advocate for an incremental, middle path for a big topic area. I’m sorry that came out a bit muddled.
If we take the approach of focusing this in on a PyTorch → torch-mlir → runtimes approach, I think the case could be made more clearly. On the bias front, I hadn’t thought that was a good idea to refine the scope in one step, since it will alienate stakeholders. But I do think that at the level we are targeting, there is more the same than different, and designing with one well lit path highlighted would generalize and present plenty of optionality as we get going (I have confidence in that, but I’m not competing with any of those pieces).
I too felt that things were only perhaps clearer to those who worked on those precursors. But to get back to the concrete discussion, I have some of the same questions that Mehdi has above. I’m listing some.
Lists, initializers, and globals appear to be of general use and aren’t something specific to ml. Do you want to call this the program dialect? And if we did that, it wouldn’t be clear why list ops or types are placed here – looks arbitrary. Should there be an adt dialect? Types could get their own zero op dialect if needed.
How are the initializer related ops lowered on the LLVM path? Both LLVM proper and MLIR in llvm support global ctors and dtors ops. What’s the connection to these?
Should the func op in discussion on another thread become just program.func and the existing module become program.module? The justification to have a new module from the slides looks weak and hinges on future unknowns.
Another thing I found missing in the slides is the new types being added. Would the list ops for example support decomposition and deabstration to tensor types (if desired) just like TF MLIR transforms supports?
I feel if we don’t look at this generally and in conjunction with the existing upstream dialects, we risk adding too many duplicate ops and parallel paths – it’s possible some of the things you wish to support either fit well into an existing dialect or another new dialect or supported by existing ops with extensions if needed.
Thank you Stella, this is very interesting. I am +1 on the support of other low-level data structures (such as lists).
After watching the presentation I was also wondering what are the module and func operations targeting?
Is the design of ml_program.module similar to gpu.module? Will it exist within a builtin.module outlining only the ml graph? Can it be represented by an attribute in a builtin.module instead?
If ml_program's intent is to contain outlined code for a specific (ml) target runtime, doesn’t it call for an offload dialect that could be used for other runtimes different than ml?
This is really interesting, I’m excited to see progress in this area. I’m also a bit concerned about the scope of this proposal. It seems like there are a couple of ways to slice the “ML program” problem:
There are the “container” problems, e.g. the top level modules/functions/etc things, being able to represent the equivalent of SavedModel sorts of things etc.
There are the op dialects in the tensor domain / ML Graph level of abstraction - do we have one true unifying operator set, or do we have lots of framework specific “tf.” and “tfl.” and “hlo.” and “onnx.” … dialects.
There are questions about unifying the type system for ML operator graphs, e.g. the list types, and the operators that perhaps work on them.
There are code generation, bufferization, and lowering concerns.
I’m a bit skeptical that we’ll find a true unifying solution for all frameworks that isn’t a lowest common denominator (inducing usability problems) but I think there is a lot of benefit in making composable options for each of these problem domains, even if we end up with a standardized “ml program” outer wrapper that composes with framework specific ops for the weird cases.
Like Uday I missed last week’s call, but I should make this week and next week. It would be great to continue this discussion!
I think I agree. I think what you are getting at (and is my ~informed intuition as well) is that there are common elements that enables more composition for the weird cases but neither centers them nor lowest-common-denominators the result. On the Google side, we do have (relatively new) consensus to focus on building new/simpler/up-and-coming cases fresh and upstream in MLIR and leaving the truly esoteric parts of interop with everything that involves our somewhat expansive existing product lines to such escape hatches and compatibility layers (i.e. “our problem”). I think also, under the auspices of Torch-MLIR, we are well positioned to pursue a similar line of development. I have also heard credible rumors from the ONNX side that similarly could point in this direction. That alignment is ultimately what I am trying to unlock and capitalize on with this topic.
@bondhugula This is a very good breakdown of technical discussion points. If you don’t mind, could we attempt to answer them in a more structured way in upcoming ODMs vs starting too many rabbit trails on this thread (I started writing some answers but I think that your meta-point is that it may be useful to pop back up to the top of the stack and start again)?
Just to provide some answer to #1 and #4, I think there is a lot of potential for a suitably low-level adt dialect. Based on our experience with torch-mlir, that may or may not be the best thing to use directly for the higher level interop, but something that would make sense to decompose into. Tied up in terms of working it out is more design around ref-objects and lifetimes (IREE has a well defined modeling of this internally, but we really need to figure out how to introduce the concept properly in terms of MLIR foundations).
A few specific comments:
There are equivalent (but less tortured) concepts on the Torch-MLIR and JAX sides these days. I think this is ours to defragment at this point and provide some common structure for.
I think we might be ~a couple of years out from having a concrete answer to that question. What we do have today are three effective, compiler friendly candidates for the core tensor-algebra: tosa, mhlo, and onnx-mlir. To differing degrees, these do encapsulate the “80% of the weird” cases, and I think that starting from such a plateau is important/useful. I think where we get off the rails is when we attempt to smash too much of the rest of the domain directly into such op sets: they are useful consolidation points but they are not comprehensive. I may wish for one such thing, but for this period of time, my opinion is that three not incongruous options there is tractable and we let evolution play out a bit more.
+1 - on the more “modern” side of things, there isn’t that much here. I think we can design for it in a reasonable way.
We can spend some more time working on material, or if you all would like to suggest topics to dive in to, I’d be happy to use that to drive.
Right, my personal opinion on this is that we’ll never “standardize” the common set of ML operators - ML moves too fast, there is too much diversity, and there is a lot of historic mess that isn’t worth cleaning up. Instead of getting to the one true answer, I think it would be beneficial to look at this as a factoring exercise - instead of success/fail, we can benefit from taming “more of the problem” than less. We can always represent the weird stuff as ops in framework native dialects for full fidelity.
If you haven’t already, I think it is worthwhile to dig into “what went wrong” with TorchScript. My understanding is that they found themselves on a slippery slope towards implementing much of Python, which undermines the goal of being “not python”
Cool, I’m equally happy with an informal discussion if you prefer that. I think the ontology question is the a good first place to start - if we can decide what the buckets and components are, then we can divide and conquer them.