[RFC] Torch-mlir lowering to named ops

Looking into torch-mlir, the lowering to linalg is as generic but convoluted as I expected. Lots of corner cases, but also lots of generic builders and code reuse.

Trying to create an ingress in linalg named ops there may not be simple enough, especially if we want to keep the choice (generic vs. named ops) for where there’s an option. But given that we have a generalization pass, for the recently added, we could still produce the same result after lowering + generalization.

However, as discussed previously, with the named ops semantics being of strictly explicit casts, we would naively generalize to two generics (cast + op) instead of one (with casting affine map / op). So, “the same result” may take some time, or mean slightly different things.

The question here is simple: do we want this in torch-mlir? Can we achieve this from different ingress (torch, onnx, stablehlo)? My main goal with this is to have a canonical linalg pattern that optimizing compilers can work with without having to worry where the IR came from.

1 Like

I think I would need to see more details here vs a blanket ack. In my experience, every new workload I implement uses a large amount of mixed precision and cross type algebra. I’d go so far as to say that is the rule vs the exception for modern things. In trying to canonicalize the vocabulary, I’d really like to not regress back to the older world where broadcast and cross type sequences can only be expressed through complicated assemblies of algebraically inspired operations.

Not to be too extreme, but I live in a world where optimized implementations have almost none of the “standard” ops. It’s all weird.

With that said, there is no reason to be prematurely lowering aten to exotic non named ops in most situations. Just a lot of hands and a lot of history got it to where it is. Straightening some of these paths to get back closer to a canonical lowering of most of aten to canonical named op sequences would be a nice improvement. I expect there are a handful of places that account for a lot of the diversion now.

I think as long as it doesn’t get too pedantic (ie. “Everything must” level), there is a lot of room for improvement here, but the work needs to be op-by-op (or op-class at a time). And we need to not try to fit the exotics into a mold without further thought.

Edit: it’d be best to not have two lowerings to linalg. Those things are massively expensive and encode a lot of practical knowledge. You could either jump in to the soup or add passes to the top, letting the more generic underpinning shock absorb. The conversion is piecewise to facilitate that. We’ve taken such an approach in the past when uplifting premature lowerings (thinking of the view op, which started life as a universe in itself and is progressing to lower directly to upstream things that do the heavy lifting). Then at some point, you can chop the generic lowerings.

Absolutely agree. I didn’t mean “must”.

The decision to fix the semantics for basic operations was very pragmatic, mainly because the design space of any other method makes things more complicated. But as you said, many operations are not basic.

Currently we’re looking at three potential lowering strategies:

  1. Sequence of named ops with “strict semantics”. This will give us the feeling of how much we really need grouped ops (like softmax, layernorm, gelu) or a more generic “group” op that carries the semantic inwards.
  2. Generic operations with complex affine maps, iterator types, and complex region body. For some casting this may be the most trivial solution.
  3. “Special” ops with complex semantics (ex. conv, pooling) that is hard to get in either basic named or generic.

The order above matters, with the first being (supposedly) simpler than the second, etc. It also won’t be either/or, but and/all. I don’t think it makes sense to pick one design for all operations and force that through.

This is a worst case scenario where for some reason we keep multiple lowering strategies as a user option. I’m not in favour of that at all.

To me, the most powerful thing we can do is to lower to a canonical form, that compilers down the road can trust. There can be a composition of multiple types of ops, but each choice should be unique, explicit and documented.

That’s what I’m trying to do. Incremental improvements over time, consolidate choices, reduce scope of magical ops, document semantics and eventually coalesce into a de-facto canonical form.

There should be no hard stop, and we should deprecate functionality based on better paths elsewhere. This is why I want torch-mlir to drive this. Several downstream projects can do a lot of different things, but if an upstream project does one thing well, then we can all agree on this thing as the canonical form.

Agreeing on a canonical form only in downstream projects doesn’t work. It’s like a double pendulum with multiple external forces: it never stays down too long, and it’s rarely predictable.

Cool, if that’s the approach and thought process I’m all for it. I’d recommend treating the existing linalg lowerings as the catch all. I’m all for them being improved in place where possible, but for some of the… Let’s say… Much loved ones… There is a lot of history encoded in the lowering and it can be very hard to pry apart. Also as you note, many of the common lowering utilities predated any consensus on the named ops and may cut pretty far against the grain you’re trying to aim for. It is probably much easier to do more named op lowering at a layer above. It’d be great if things could be fixed in place but I’m acutely aware of the work and archaeology involved in doing that.

My only other advice is to just make a softmax op from the get go and don’t look at view first :slight_smile:

(Less tongue in cheek: there is no principled reason to have softmax, but every opset that has excluded it has gone through years of tug of war on extension mechanisms and debate that basically boiled down to “we need this for softmax because [many reasons]”… Some things just don’t have principled justifications. And view is a perfect storm that finally has a plan upstream, so progress is happening there)

1 Like