A recurrent design point keeps on surfacing when trying to expose LLVM ops to MLIR types.
Most recently, this is exposed by:
-
https://reviews.llvm.org/D92172 for
sve.vscale : index
->llvm.vscale.i64() : i64
-
https://reviews.llvm.org/D92216 for
std.inline_asm
->llvm.inline_asm
.
The general comment is that we should not bring LLVM representations to the non LLVM parts of MLIR without a deeper design discussion, otherwise this just becomes a copy of the LLVM dialect with the MLIR type system. This point is very valid and I would love if we could have a generic way of improving the situation here.
Now in practice, few people work on both LLVM and “something else” (SPIR-V, LLO, …) and being able to generalize abstractions such as “inline asm” to different targets requires practice and iteration. This only comes with building end-to-end systems that run: it is very easy to overlook deep fundamental issues that only arise when the SW rubber meets the HW road.
How can we proceed here to make the most efficient use of resources, I see a few options:
- add a) a matching abstraction to an MLIR dialect operating on MLIR types (e.g. std, vector or target-specific HW), b) a lowering to the LLVM representation, c) ensure the abstractions runs end-to-end. Then over time, build experience and refactor, rinse / repeat: this is an evolutionary process.
- have LLVM dialect ops also accept MLIR and have a generic conversion pattern to convert the types.
- have a dedicated LLVM “staging” dialect for ops that are 1-1 with LLVM and that we expect to generalize and graduate to higher-level dialects over time.
I have been operating on the assumption that 1. is good for velocity but it obviously comes with issues (copy-pasta of abstractions, bloat, YAGNI generalization, …).
Option 2. may be limited by circular dependencies between dialects and likely limits the early attempts at generalizing semantics. I.e. if one wants to generalize semantics, one first creates a new op so it’s almost back to option 1.
Option 3. seems to only shift the problem from Option 1. to a more self-contained dialect (which may still be viewed as a win).
Do people see other options?
Generally, how can we improve the “just get it connected for now” aspect and improve velocity?
I strongly believe we do not need to solve all generalization problems at the same time to make progress and that iteration speed is of the utmost importance: learning needs a gradient.
Thanks!