Overview
This RFC proposes the implementation of a new MPS target in MLIR
This dialect is providing the ability for the MLIR ecosystem to target Apple devices. We implemented a new serialization target that allows compute acceleration on Apple platforms, and delivers best performance to general compute workloads across macOS, iOS, tvOS, visionOS, using the MetalPerformanceShadersGraph framework.
The new MPS target implements the following features:
- Introduces a new MPS dialect in-tree;
- Represents high level abstractions for general compute operations;
- Fully versioned, backward/forward compatible dialect with minimumDeploymentTarget for each of the major Apple operating systems;
- Serialization entirely based on MLIR bytecode.
Context
The MPS dialect represents a general purpose compute IR, leveraged within Apple in the MetalPerformanceShadersGraph (or MPSGraph) framework. The framework offers to declaratively assemble a general purpose computational graph, compile and optimize it on a given Apple device, execute it using native Objective-C and Swift APIs and deliver the best performance on each Apple platform.
Apple CoreML, Pytorch, Tensorflow, JAX, ONNX are frontends that today target MPSGraph on Apple devices in various capacities, either through Objective-C/Swift APIs, or within the MLIR dialect-to-dialect conversions.
Motivations
The addition of the MPS dialect in MLIR allows projects building on top of MLIR to use the MPS dialect as an exchange format in order to target Apple platforms. The workflow would be to serialize a versioned MPS module providing a minimum deployment target for an OS version, using the standard MLIR bytecode infrastructure. Apple natively supports the MPS MLIR bytecode as input with its mpsgraphtool
publicly available as part of macOS 14.0+, and then integrated as part of an Xcode project. The tool allows to convert the MPS MLIR bytecode into an mpsgraph package deployable on devices.
The community will also benefit from having the MPS dialect in-tree to leverage the large amount of bytecode tests we currently rely on internally to verify forward and backward compatibility of MLIR bytecode format. We did catch issues during our regular LLVM upgrades and having this test set in-tree would allow breaking changes to be easily caught and fixed. The MPS dialect would also be the first versioned dialect in-tree, and serve as a reference for anyone trying to implement a stable and versioning scheme using the primitives offered by MLIR.
One can even imagine a future direction where compute graphs in MPS dialect can be lowered to non-Apple HW too by converting them to other dialects, for example deploying on Nvidia or CPU or RISC-V with open source community developing open source conversions to provide cross-platform support.
MPS Dialect description
The MPS dialect currently contains 222 operations and it is designed as a stable set of operations able to support the major higher level compute frameworks. As such, the IR generally expresses the most common operations from the major compute and ML frameworks. The general design combines three principles:
- Fidelity to MPSGraph Objective-C/Swift APIs;
- Apple Backend and OS agnostic;
- Limited the operation footprint. The amount of operations is expected to grow only if new functionality cannot be mapped to existing MPS dialect operations.
Directory Setup
MPS header files will be in include/mlir/Target/MPS, include/mlir/Dialect/MPS. Sources files will be in lib/Target/MPS, lib/Dialect/MPS.
MLIR Dependencies
The MPS dialect does not depend directly on any dialect upstream. Particularly, a serialization/deserialization functionality will be exposed such that users can leverage the IR to generate versioned MLIR bytecode files that could be read in a backward compatible fashion. No current MLIR dialect is dependent on MPS. The MPS dialect is intended to remain general enough to support any major compute framework, and at the same time, independent of any particular Apple target.
Who are the future contributors/maintainers beyond those who propose the dialect?
Compute Frameworks team at Apple would be the main contributors and maintainers of this dialect as it is used in the shipping MetalPerformanceShadersGraph framework since 2020.