RFC: ONNX Import into torch-mlir

Hey folks, as mentioned on another thread, I’m working on a POC to import ONNX directly into torch-mlir. Here is the initial implementation: DRAFT: Initial ONNX importer. by stellaraccident · Pull Request #180 · nod-ai/SHARK-Turbine · GitHub and we can upstream it in the same way that we are talking about doing for the FxImporter. It needs more work (I’ve just been hacking on it for a few hours), but it can already mostly import resnet50 from ONNX’s test suite.

To go with this, I would like to begin working immediately on a conversion pipeline in torch-mlir that legalizes recognized ONNX custom ops (I import all normal ONNX ops as Torch custom ops) to equivalent torch ops. Since they are the same level of abstraction, I expect that some fairly trivial conversions will cover the vast majority of the ONNX op set. And since they are just importing as Torch custom ops, we can let any unimplemented ones flow through to be handled by downstreams if they want.

I’d like to just write these conversions directly in torch-mlir so that everyone can use them, but if people don’t think this is a good idea, I can just keep them downstream in IREE/Turbine.

Let me know if interest or objection.

Stella

(edit: I should also mention that for Turbine/IREE, we plan to write an equivalent of this pure Python importer in C++ against the MLIR C API for use in the onnxruntime, but I expect the rest of the flow would be the same from there and that is just scaffolding code. I don’t have any thoughts about how/if to upstream that but am happy to share if there is demand)

2 Likes

First patch is up: Initial TorchOnnxToTorch conversion pipeline. by stellaraccident · Pull Request #2585 · llvm/torch-mlir · GitHub

I’m going to have a few people on my team burn this down over the next few weeks.

Here’s the rough notes we are working from:

I added a pretty substantial README to the PR and also dumped a thousand test cases to start with (referenced in there).

In general, if you have multiple people I would shard the work:

  1. One or two folks implementing ONNX lowerings (they are in different files sorted by op name, so folks can start in the beginning/middle/end and not step on each other too much).
  2. Prefetch the ops missing in torch-mlir that need implementations and get someone working on those. I was seeing a lot of the trig functions being very under implemented.
  3. Have someone working on e2e fuzzing of the TorchToLinalg pipeline. The full set of imported test cases, once implemented, represents a lot more fuzzing than this pipeline has gotten. I just tried to run some of them through the torch to linalg conversion (and IREE compilation) and was getting crashes on some basic stuff near the top of the pipeline. Use this as a change to fuzz all of that. I’d say that there is enough parallelism here for ~4 people to be working full time for a few weeks without stepping on each other. Could add more but will be a lot of cooks in the kitchen.
  4. Once we get enough op support to run some of the full models, we will proceed to e2e testing.

I’m not sure I follow why add ONNX support to torch-mlir. What about GitHub - onnx/onnx-mlir: Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure?

IREE could use onnx-mlir as an input just like torch-mlir.

Honestly, we just added support for torch-mlir on our project (via Dynamo), so I’m not complaining, just trying to understand the rationale.

(some of this can be debated for sure but is based on practical experience with the projects)

A few reasons why we want to go this way as an option:

  • In its new organization, torch-mlir is primarily back to a “just MLIR dialects and conversions” place with some pure Python code to bridge to infeed frameworks. In my experience, this limitation in scope and sprawl makes it a well-behaved dep to take/keep a dependency on. To contrast, onnx-mlir remains, primarily, a compiler in its own right and has a correspondingly larger amount of moving parts.
  • If the goal is to exit the ONNX ecosystem to a compiler that is already based around PyTorch, then I don’t need a heavy-weight ONNX layer with a full isomorphic-as-dialects-and-interfaces project structure that is more geared for building a whole compiler out of.
  • We don’t want to implement more lowerings from ONNX to a lower level when a lateral translation at the same abstraction level gets the job done and does not lose information or add extra hops from the source. Lowerings are quite a bit more expensive than translations to build and maintain. From this perspective, we would probably just seek to translate ONNX to a torch-level dialect anyway, and since compilers handling PyTorch already have to handle custom ops, just letting ONNX piggyback on the same thing has some nice properties when it comes to thinking about how to support and evolve things as they move forward.
  • Given the above and that almost all of the infra for maintaining an isomorphic-with-onnx dialect and converters could be replaced (for our limited purpose of having an exit) with a <600 line Python file, this seemed like the obvious choice.
  • I was debating whether to just carry all of this in IREE but figured that as an option, a lot of us have a very similar use case, and thought it would be better to build out the conversion pipeline in a way that everyone could use it. If that doesn’t emerge or this turns out to be a bad idea, we can take it the other way down the road, but knowing how much inertia there is in upstreaming something like this, figured I’d bias towards everyone being able to use it from the get-go.

I would still be open to a common super-repo which combined torch and onnx level dialects, outbound conversions, and python level interop (and only that), but that topic has been open for years with no progress, so it is time to expand the island we are on, imo. We’re bought 100% into PyTorch interop and it is more practical for us to fold ONNX support into that than go the other way.

Thanks @stellaraccident, this all matches with our internal assessment, too.

Strong commendations for making this decision! Thanks!

We could even give it a go on some small models from ONNX, through torch-mlir into tpp-mlir to see what happens! Let me know when you have something more concrete.

Our discussion internally has let to the same conclusions. Strictly in the view of a single importer into MLIR, which seems to be what you’re proposing (and we fully agree this is a good idea), if torch can handle most of onnx, then the onnx dialect is redundant, except in the parts where it isn’t.

I don’t look at torch, onnx and stablehlo as “dialects from those front-ends”, but as “dialects that I can do something about it”. And if most I can do about in one is the same as in the other, then there’s some redundancy to remove.

Of course, those dialects have their own roles in their own front-ends, but as upstream is concerned, having a cross of torch/onnx/stablehlo/tcp in an incubator project would be a great way to getting it into MLIR proper as the de-facto standard for importing into the MLIR ecosystem.