Here is the list of talks, detailed program below:
- Pallas: A JAX kernel language for GPU and TPU (Google Deepmind)
- Enzyme-MLIR: Early Experiments on multi-level automatic differentiation (MIT)
- Physical Device Modeling in MLIR (AMD)
- CIRCT in 2023 (Microsoft)
- Dataflow Architecture Compiler Design Using MLIR (short talk) (SambaNova)
- xdsl-run: an interpreter for MLIR (short talk) (University of Edinburgh)
- A stable serialization format using MLIR (Apple)
- Exorcizing the (C)Python interpreter to implement a universal MLIR frontend (University of Chicago)
- MLIR Interpreter for a stack-based programming language (Modular)
- Progress Report on the MLIR Sparsifier (Google)
- Modernizing types and attributes (short talk) (University of Edinburgh)
- Data Tiling and targeting fixed instruction sequences in IREE (Nod.ai)
- XeTile: A low-level dialect for generating high-efficient GEMM (Intel)
- Poison Semantics
- Using MLIR from Python
Sharad Vikram (Google Deepmind)
We introduce Pallas, an extension to JAX that allows embedding custom kernels inside larger JAX programs. Pallas is cross-platform, allowing you to write kernels for both GPU and TPU. It is also compatible with JAX transformations, enabling you to do things like
jax.vmap a kernel. Importantly, Pallas enables researchers to push the performance frontier of their hardware, while still being an accessible front end for numerical computing.
William Moses (MIT) @wsmoses
This presentation will discuss the continuing effort to scale up the Enzyme automatic differentiation (AD) tool from operating on the LLVM internal representation to the broader MLIR representation. MLIR offers unprecedented extensibility by supporting user-defined instructions and types in the compiler, which is a challenge for a compiler-based AD tool. It requires one to conceptualize a differentiable compiler instruction and capture all information required for AD in abstract terms, but as a reward allows one to choose the most suitable level for differentiation on the set ranging from machine learning-style tensor operations to loops, to assembly-like instructions.
Zhixun Tan (Google) Cheng Zhang (Google), Jennifer Pullman (Google), Matias Scharager (Google, Carnegie Mellon University), Roy Tu (Google), Sajjad “JJ” Arshad (Google), Shuofei Zhu (Google, Pennsylvania State University)
Stephen Neuendorffer (AMD) @stephenneuendorffer
ML accelerators often have complex internal architecture which does not map neatly into the traditional compiler concepts. The ability to represent concurrency, memory hierarchy, and explicit data movement is often critical to achieving performance. With the right dialect, MLIR enables these concepts to be explicitly represented and provides a foundation for end-to-end flows. This talk will illustrate this using the AMD/Xilinx AIEngine architecture and provide insight on how MLIR concepts can be used to model other devices.
Mike Urbach (SiFive), John Demme (Microsoft) @mikeurbach @jdd
We will provide brief updates on many CIRCT subprojects. We will then delve into interesting technical details, problems we have faced (some of which MLIR could have helped solve) and our solutions to them (some of which are generic enough to be upstream-able). Major issues include symbol tables, replacing Function[Type,Like], and Python bindings.
Joseph Primmer (SambaNova) @Joseph
Sasha Lopoukhine (University of Edinburgh) @superlopuh
xdsl-run: an interpreter for MLIR IR. An early look at an interpreter that allows users to run MLIR IR without compiling, with hooks to register individual Python functions for each operation. It can be used to implement a modular interpreter, with operation implementations provided separately for each dialect, combined into a single interpreter instance at runtime. It has been useful as a way to test lowering correctness in our experiments. This talk/demo will show some example uses to verify that behavior is consistent across rewrites on Toy.
Matteo Franciolini, Dhruv Saksena (Apple) @mfrancio @saksenadhruv
A year ago, a bytecode serialization format was introduced for MLIR. As opposed to the existing textual format, bytecode offers significant performance improvements in terms of IO, memory requirements, opportunities to do memory mapping, among others. However, the initial functional implementation came without stability guarantees, enabling the use of the bytecode serialization as a tool for temporary IR storage, but severely limiting its adoption for a stable serialization. The talk will go over the progress made over the last year towards reaching stability of the bytecode format and the additional features that were implemented, such as IR versioning, lazy loading, use-list order preservation, and efficient encoding of op properties. The talk will also discuss how a client dialect can leverage the MLIR bytecode features to build a backward and forward compatible serialization format.
Maksim Levental (University of Chicago) @makslevental
A MLIR dialect without a language frontend is like a belt without any pants to hold up: very useful in principle but ultimately unproductive. If you find yourself in this unenviable position, don’t be caught pantsless: let me show you how a few (cute) tricks can transform the (C)Python interpreter into a convenient frontend for your dialect.
Jeff Niu (Modular) @Mogball
This talk will give an overview of the implementation of an MLIR interpreter. Interpreters can play many useful roles in a compiler stack, from constant folding function calls and memory accesses, to validating IR. We will discuss how the implementation is generic over control flow semantics and how it implements a memory model. We will give an example of how the memory model can be used to build a dense conditional constant propagation analysis. To conclude, we will also discuss the challenges with building a generic MLIR interpreter, and what it would take.
Aart Bik, Peiming Liu, Yinying Li (Google) @aartbik @PeimingLiu @yinying-lisa-li
In this talk, we will discuss recent improvements made in the MLIR Sparsifier, present ongoing efforts, and will provide a demo of “sparse compilation” through a colab environment.
Mathieu Fehr, Tobias Grosser (University of Edinburgh) @math-fehr @TobiasGrosser
We will present some ideas on how to modernize types and attributes in MLIR, and show how this improves meta-dialects such as IRDL and PDL. We will first take a look at type and attribute name parsing, and show the benefits of adding proper names to attributes and types, similar to operations. We will then look at the relationship between the Type and Attribute class, and the benefits of making Type a subtype of Attribute.
Mahesh Ravishankar, Benoit Jacob, Hanhan Wang (Nod.ai) @MaheshRavishankar
This talk describes the recent work in IREE that enables the use of data layout transformations to improve matrix multiplication performance. There are two parts to this work that are independent, but work together to deliver good performance, (a) doing a data layout transformation of the operands to be more friendly to the SIMD ISA and/or to the memory system (caches) and (b) the ability to offload the innermost loop code to predefined sequence of instructions on specific hardware. These techniques by themselves have been well known to the community, but this talk will show how these two tie together. While the current implementation has been evaluated on CPU architectures, the same approach is readily usable on any backend that IREE targets. We will also discuss the performance of the code generated through this approach.
Jianhui Li, Chao Chen, Shahneous Bari, Md Abdullah, Gusthinna Waduge, Charitha Saumya (Intel) @Jianhui-Li
To facilitate efficient code generation for GEMM on Intel GPUs, we introduce the XeTile dialect that supports a tile-based programming model. XeTile decomposes the GEMM kernel to large pre-defined tile sizes at the subgroup and workgroup level. With the XeTile dialect, the tile-based GEMM algorithms can easily be expressed, and it enables advanced optimizations like cooperative load/prefetch, K-slicing, and software pipelining. Underneath XeTile, the implementation uses target-specific features to get the best performance on specific hardware. Based on XeTile representation, as the GEMM is decomposed at submatrix granularity and mapped to registers, it supports optimization such as fusion with neighboring operations. Although XeTile is developed for intel GPU, the dialect definition is target-independent, and it can be lowered to different hardware targets. We would like to discuss with the MLIR community and obtain feedback to finetune it as an upstream dialect.
Jakub Kuderski (nod.ai), Ivan Butygin (Intel), Karl Friebel (TU Dresden) @kuhar @Hardcode84 @KFAF
Poison semantics, originally introduced in LLVM, allow for defining semantics of ops with undefined behavior and modeling deferred undefined behavior. This roundtable focuses on the implementation strategy specific to MLIR.
Mathieu Fehr (University of Edinburgh) @math-fehr
How we could improve Python Bindings in MLIR, and also how we could contribute some of the ideas we had in xDSL in MLIR python bindings.