This Thursday, September 28th (9am California Time, 16:00 UTC ), we’ll have a presentation by @yaochengji and discussion about a recent proposal on [RFC] Sharding Framework Design for Device Mesh
Nowadays, a robust sharding infrastructure is imperative for distributed computing, in particular for ML and Large Language Models, regardless of whether sharding is determined manually or automatically. Meanwhile, mesh-like deep learning clusters are being extensively employed. For instance, the TPU 3D torus is explicitly defined as a mesh-like cluster, and several DGX boxes could be regarded as a 2-D mesh-like cluster.
The motivation of this presentation is to propose the design of the Mesh dialect, aiming to represent the mesh cluster and the sharding, along with collective communication operations on it. Additionally, it will outline a typical workflow for undertaking sharding annotation from frontends, performing sharding propagation and optimization, and converting the IR to SPMD format.
Zoom Meeting Link is unchanged, the presentation will be recorded and posted here and on our talks page on the website as usual.
You can also subscribe to this Google calendar to keep informed about incoming meetings.
Meeting ID: 851 5109 0498
Passcode: 828404