This Thursday 2/10 ( (9am California Time, 17:00 UTC) at the MLIR Open Design Meeting, I will be talking about IREE’s input dialect and what we have converged on for representing programs from current ML frontends. We’ll be covering the current status of frontend support in the MLIR ecosystem (using our work on IREE as a case to look at), the contents of the dialect to give an overview of the gaps/goals we had, and some ideas on what we can do upstream to make this more common.
Thanks, folks for talking through a messy topic that conflates some things.
If I could focus on one thread that would open the door and help with interop: I think we are in a good position to define an ml_program dialect (name, obviously open to bike shedding) containing top-level structural components and baseline types/ops for accessing them. In my mind, this starts with (in order of increasing design discussion needed):
ml_program.func (possibly multiple, depending on design discussion wrt CFG vs Graph, Pure vs, Procedural, etc)
Type hierarchy for reference/object/list types, etc
Buffers and interop types
I know from experience that there are practical infra gaps that need to be addressed even just in dealing with the first on that list, and I would propose working incrementally. Having such a space exist upstream would be a good force for getting these corners rounded. In many cases, we have landed some of the infra upstream without solid usage, and it would be good to converge that.
As an end goal, I believe that we should be targeting the ability for a future version of ourselves to provide versioned serialization on top of a small basket of dialects with this one at the root. Having a solid case upstream to work those concepts out, in an area where it is practically important for interop and scoped to be relatively narrow, seems like a positive development.
On the IREE side, we are not in a hurry to get to a finish line on this, but we would like something to start converging upstream that we could interop with and use as an eventually solid consolidation point. Having this would let us reduce or eliminate some inverted system dependencies between frontends and backends that increase costs all around.
+1 on this, in particular on the high-level structural objects.
I’m not sure though if the refcounting and associated logic belongs to a ml_program dialect or if it can be kept orthogonal? There might be a case for defining these as “external interface” (your function signature needs to match to “something”, but in this case I’d also be cautious about seeing these creeping out in the stack.
I think it should be orthogonal (or at least assumed so until we get there and design it for real). What I presented, as said, is an amalgam and we should reset and walk it forward incrementally, in my opinion. I think the value of our previous work in this area is mostly in knowing the rough shape and gaps with respect to a working solution that exists today. I’m mainly interested in getting the basic door open so that we have some backpressure to work this stuff out upstream vs fragmented across downstreams.
+1. The pain here is real. I find that often I’m spending more time figuring out how to have Torch-MLIR interop with other community components from a dependency/versioning/etc. perspective than actually building the thing of interest.
+1 on being involved in this direction. From the TOSA vantage point we are impacted in both directions and desire a clear picture - we’d like a means to express ingress considerations like serialization and versioning effectively. Below TOSA it’s very helpful to have a clearer picture of how dialect components enable codegen and associated capabilities, e.g. dynamic shapes and stateful ops, both of which are in fact topics with TBD tasks on our part.
There are a few questions and thoughts around this, e.g.
I’m not quite sure whether global + buffer view is the nicest way to express multiple things around expressing the semantics of a memory model for an ml_program
Perhaps a high level dialect abstraction of the memory/data movement to coordinate with the corresponding one expressing dispatch/compute/concurrency. I don’t have a well formed picture here, just a feeling that the concurrency ops slide seemed to be one half of a picture.
Some things I don’t understand the mechanics and interop of yet, like list ops.
In the next couple of days, I’ll prepare a patch with a seed for a new dialect and send an RFC. My goal here is to get the ball rolling and then elaborate it incrementally, so there will be plenty of chance to work out the details (which matter a lot). Nothing controversial – just structural – will be in the initial RFC.
When we get to this part, it is worth doing a survey of the memory models that exist. What I presented with IREE today is just part of ours (in fact, we just surfaced the smallest part of it that we could get away with in the input dialect – it is well defined and more detailed within the implementation). There are a lot of priors here (some good, some that even their creators would disavow with the benefit of hidsight).
I’m a bit ambivalent at this point as to how much we end up with topic-specific dialects designed to work together vs a dialect that represents a larger slice. My opinion is that for things for which there is one (or a small closed set) of ways to represent a concept, a single dialect makes sense. For things that have a lot of variability are better represented by multiple dialects designed to work together (gives optionality).
My purpose in walking through it all today is that I think that our work here provides a good survey of topics that need to be designed for, not necessarily the one design to rule them all. It is helpful to know how many free variables we will need to solve for before setting out.
We’ve got enough view of all of the frontends and backends now that I think it is the time to start giving things a name and working through it. It’ll take time, because we’ll take it one piece at a time, but I think it will be worth it.
+1 for the RFC. Regarding of list, shared variables and tensor, it makes sense.
Considering the current echo system, the most challenge part is the choice of neural network description language. It was great to consider the hlo and tosa dialect. Then how about the onnx?
The current lowering itself is really not perfect, even not good enough. Let’s take torch-mlir as an example, it has several conversions to tosa, linalg and std etc. Considering the challenge of inter-dialect optimization, it’s kindly of not-a-complete solution from my perspective.
Really looking forward to the RFC being proposed, I’d like to join the journey and contribute to it continuously.