Torch MLIR PyTorch2 Uplift

At the mlir dev conference, I was chatting with our colleagues from ByteDance, and we thought it was time to get a workgroup together to land the PyTorch2 uplift that we talked about a couple of weeks ago in the context of Turbine.

I think we basically have all of the pieces needed now to complete Sean’s long term roadmap that he posted a year ago, and it is down to doing some project cleanup and landing.

I suggested that once the conference wraps, I could write a plan to reorganize the project, and then maybe we could get a weekly workgroup together to land the patches. I don’t think it would take very long if we work together and intentionally on it.

Having a pretty good view of the code, I think that this plan will aim to reorganize the project around an MLIR core with add-on components for:

  • Pure Python libraries that can be included in your project to import and exercise lowerings.
  • Utilities for interfacing to PyTorch to do code generation of op libraries and such.
  • Native integrations with PyTorch that people are using (LTC, etc).
  • New test suite that can validate a backend using more upstream friendly techniques.
  • Original TorchScript tooling, APIs and test suite.

Basically, for those of us on the PT2 pure Python path, I’d like to organize the project so that is all we need while still meeting the needs of people who have integrated in a different way. For this subset, we just use the torch and related dialects directly as inputs to our compilers versus cutting between the projects at the current intermediate dialects.

Thoughts? I think this can be done in place if we plan it right but it will require some reorganization. This would also be a good time to drop components that don’t have current users.

Are there people who would like to participate in this uplift? I’m willing to do some of the earth moving but would like to have some collaborators on getting it done.

  • Stella
7 Likes

I’m really excited to see this happening! Thanks Stella!

1 Like

Thanks Stella. We look forward to further collaboration. And I would like to add a longer-term work item:

  • How can we represent dtensor/sharding in torch-mlir, combined with the recently proposed mesh dialect?
  • How to get it translated from the torch framework’s side?

This work item should also depend on the graph capture work of the dtensor/torchdynamo team from PyTorch.

1 Like

This is an exciting direction @stellaraccident and we (at Cruise) have been interested in a more formal Torch-MLIR <> PT2 integration for sometime now, so this couldn’t be more timely!

We’re happy to help; perhaps with testing some of these changes on the inference side (torch.export path) on our workloads and report back any issues we encounter?

To crystallize the discussion you, @raghavanr and I had at the MLIR workshop today, our asks are mainly for:

  1. Keeping TS paths alive for some of the legacy integrations
  2. Providing a “minimal Turbine” interface (i.e. PyTorch → Torch dialect export) in Torch-MLIR that doesn’t depend on IREE compiler/runtime

Based on your breakdown above, these appear generally aligned with your vision for what this reorganization would look like. Please LMK if that is not the case.

Thanks again.

Sambhav

Ok, I may have slightly oversubscribed myself for this week but will put writing a plan at the top of my list for next week.

A couple of comments inline:

Yes, in fact, both of these are primary goals for wanting to do the reorganization. It will be a while for us too before we move everything off of the TorchScript path because we tend to prioritize workloads on an as needed basis, and if it ain’t broke, don’t fix it. At some point, this will become troublesome to maintain, but we can discuss that later.

Once we refactor things a little bit, it will be straight-forward to upstream the key parts of Turbine. Most of it is only incidentally tied to IREE by way of how it is packaged.

I would consider these topics a primary reason to simplify the way this is put together. It will enable us to express these new conepts. I was also talking with another engineer yesterday (whose name I can’t remember) who would like to revisit quantization representations but was worried about “the dark parts” of torch-mlir. The easiest way to unblock things like this are to just remove the dark parts – most of which are tied to the old TorchScript path.

1 Like

Looking forward to this! Happy to be part of the workgroup to help land this.

Thanks Stella. Firstly I agree with @sjain-stanford 's Keeping TS paths alive for some of the legacy integrations, it’s critical for our integration.
Secondly I want to ask: Does Torch-MLIR will support communicative operators(like c10 ops which could be captured by fx)? If designed to do this, we should discuss how to improve some passes/canonicalizers which related to device. There are many passes/canonicalizers drop device information.

We probably have to. I imagine it will be easier without the TS passes but still needs a scrub.

This week has been crazy for me and I need to take some time off. Will work on a plan this coming week. Sorry!

Hey folks, here are my notes on how I would like to re-organize the project in the short term: Proposed torch-mlir reorganization · GitHub

Primary goal is to have a pure pt2 subset at the project root which only contains pure Python Dynamo import code without the legacy of the whole project.

The alternative I considered was to attempt to conditionally compile enough things to get back to a pure Python stub library that would be suitable for direct inclusion in downstreams. However, the current Python namespace is quite polluted and there is a lot of PyTorch C++ dependent code strewn throughout.

I opted to try to keep the user API surface of the pt1 codebase intact in such a way that it could include the pt2 subset and still be reasonable.

2 Likes

Thanks for putting this together @stellaraccident! We are very excited to see this. I was wondering, do you have a rough timeline of when each component is going to land?

If we get consensus to go down this path, it is a matter of me finding ~a day to do the main earth moving (I’d be happy to accept help on that but I think I can also do the main project organization stuff efficiently if I have the time). Definitely not this week.

After the project structure updates, I think we can shard out some other items. Testing infra, dynamo workload burndown, CI, APIs, etc…

Thanks @stellaraccident . The proposal above LGTM. Based on the code reorganization proposed, I don’t anticipate much refactoring in the bazel build (because we only cover LIT tests at the moment, not the TS e2e tests) but happy to help with it should the need arise. Can’t wait to give the PT2.0 → Torch importer a try! Please keep us posted on your “earth moving” work :slight_smile:

1 Like

Ok, I’ve heard no objections. I’ve got time in the second half of this week to start patches. We’re tracking the work on the Turbine side here: [tracking] Upstreaming · Issue #123 · nod-ai/SHARK-Turbine · GitHub

2 Likes

I’m also ready to contribute to this.

First PR will land shortly: Re-organize project structure to separate PyTorch dependencies from core project. by stellaraccident · Pull Request #2542 · llvm/torch-mlir · GitHub

There was some discussion about naming and next steps on Discord: Discord

Specifically, we discussed looking at the LTC code at the next community meeting and coming up with a plan for it.

1 Like

I started this issue with follow-on work: Extract durable components from projects/pt1 · Issue #2546 · llvm/torch-mlir · GitHub

If you feel strongly about taking responsibility for some of it, please convert the task to an issue (“target” button to the right) and assign yourself (or ask a committer to do so).

We can discuss more at the upcoming community meeting. Can someone remind me when that is (I think I heard Monday).

Happy to help.

I refactored the FxImporter in Turbine to be standalone: https://github.com/nod-ai/SHARK-Turbine/blob/main/python/shark_turbine/importers/fx_importer.py

My thought is that if we can commit to this kind of programming style, we can host this directory in torch-mlir as the source of truth for everyone to use, and then downstreams can copy it as needed if they don’t want to take an actual dep on torch-mlir Python (i.e. IREE and Turbine will just copy).

I’ve also started a pure Python ONNX importer in the same directory and can upstream that too if there is interest.

How would folks like to proceed?

Thanks @stellaraccident . Hosting the standalone Turbine’s FX importer in torch-mlir SGTM!

We (at Cruise) are taking a dependency on Torch-MLIR anyway, however for the ones that don’t take a dep (IREE/Turbine), I’m curious how we’d deal with divergence when features/bug fixes to the importer are made in one of the downstream copies.