[RFC] Sharding Framework Design for Device Mesh

@yaochengji and discussion about a recent proposal on [RFC] Sharding Framework Design for Device Mesh

Nowadays, a robust sharding infrastructure is imperative for distributed computing, in particular for ML and Large Language Models, regardless of whether sharding is determined manually or automatically. Meanwhile, mesh-like deep learning clusters are being extensively employed. For instance, the TPU 3D torus is explicitly defined as a mesh-like cluster, and several DGX boxes could be regarded as a 2-D mesh-like cluster.

The motivation of this presentation is to propose the design of the Mesh dialect, aiming to represent the mesh cluster and the sharding, along with collective communication operations on it. Additionally, it will outline a typical workflow for undertaking sharding annotation from frontends, performing sharding propagation and optimization, and converting the IR to SPMD format.

