I see, I seem to have gotten that the wrong way around, and that’s probably the source of my confusion. Now I can see why we’re not talking about the same things.
Linalg is great for tiling and fusing (if the ops match ranges), but affine/scf is required for blocking (reshape for locality), reordering loops and finding the right level for a library call.
Even though linalg has a library name attribute, we don’t have functions for the whole op (full conv, fully-connected) but we do have low level “instructions” (aka smaller composable library calls) that we know is efficient when called in a certain order. This is what we call Tensor Processing Primitives. Think TPP as a CISC instruction in the sea of (MLIR) RISC instructions.
Our top-shelf op, the batch-reduced GEMM, is super efficient for a bunch of ops bundled together (batching, GEMM, reducing, even activation afterwards), so we want to fuse the MLIR ops like tcp.group
or scf.execute_region
do, which will then, be a library call. But we need to know in which sub-tensor we’re applying the function to (to guarantee the tile shape is consistent with the rest of the iteration space, and to be able to do later fusion after re-order), and scf.execute_region
doesn’t make that easy.
For now, we’re massaging the IR to get it in the shape we want, because our goal is the optimisation passes, not IR design. But that’s not a realistic long-term goal, so we’re also interested in upstream dialects, even if they’re not LLVM proper, to see two things: first, how we can reduce the IR massaging we have to do; second, to understand what front-ends generate, to be able to consume that directly.
Having an common intermediate dialect that other front-end ingress dialects lower to would be awesome as a starting point. Having an upstream (read LLVM/MLIR) dialect that allows us to work at a slightly higher level than Linalg (for example, keeping relu
instead of lowering to ops) at the same time we still have linalg and affine, scf, would allow us to use the right level for the right transformations, before lowering to calls+codegen.
Those two “meta-dialects” (common-from-ingress and transformation) don’t have to be the same, not even be only two. We can work with lower IR (even raise it a bit again, when needed) for the time being (or even forever) if the community needs something completely different.