Hey @mshahneo, thanks for the question. We actually have touched on this at various places, but yes they are scattered in various places (which I’ll attach some links at the end) and not written down somewhere coherently. Probably worth a blog post to explain in detail or something. For now just rehashing major points to answer your question.
Fundamentally it’s because MLIR is a better way to represent SPIR-V and accommodate its various usage scenarios.
With MLIR we can have a very natural/principled modelling/implementation of various SPIR-V specific concepts/mechanisms (e.g., proper ops for various builtins, attributes for annotations, extensible type/op system for SPIR-V extensions, version/extension/capability modelling, flexible conversion framework to enable CodeGen towards different target environments supporting different versions/capabilities, etc.), rather than trying to shorehore into existing LLVM mechanisms (e.g., metadata, using symbol names to carry semantics, name mangling, etc.), which sometimes can be quite fragile.
Also as you mentioned that the stack is much more simpler and cleaner, that’s quite useful for domain-specific compilers. ML compilers being one scenario, another is actually graphics shader compilers. It’s great that we can share lots of optimization within the LLVM stack; but only when those optimizations are what we want. There is always a question of how much optimization we need to do at the SPIR-V level, given actually many driver compilers take in SPIR-V and run it through a LLVM stack and redo lots of optimizations anyway. (In this sense, think of SPIR-V more as a stable cross-vendor middle level GPU program representation, rather than an IR for optimal code generation.) For ML compilers the stuff we want for major performance like tiling/vectorization/etc. is all done in previous steps like linalg/vector/etc., what’s needed at the SPIR-V level is mostly around light cleanups and legalization for generating binary blob (and some fixes/workarounds for certain platforms). Lighter and cleaner stack can help with toolchain hygienity, compilation time, and also controllability so that you generate the exact code pattern you’d like in the end. Going through LLVM sometimes means you lose that and are subject to the blackbox to do whatever it wants.
Expand on the graphics side further–this is one major reason why having SPIR-V in MLIR is much better. If I’m daily working on compute/ML; I’m fine to ignore the graphics aspect of GPU; but still, actually that’s what GPU was initially for and still a huge industry to serve. SPIR-V is used by multiple Khronos open APIs, including graphics-focused Vulkan and compute-focused OpenCL. There are many (vast) differences between how graphics (shaders) and compute (kernels) are represented/handled. We use different memory/addressing models (graphics: logical addressing, non-aliasing by default; compute: physical addressing, aliasing by default), have different requirements on control flows (graphics: structured control flow; compute: unstructured control flow), with different binding models (graphics: binding tables + shader global variables; compute: kernel arguments), special functionality support (graphics: tons of image/sampling related stuff, implicit derivative calculation having implications over control flow uniformity, etc.; compute: more about linear 1-D buffers, no much image sampling, etc.) and so on.
If we are only talking about the compute, going through LLVM is easier because of the similarity between LLVM assumptions and SPIR-V for compute (i.e., the
Kernel capability tree in SPIR-V). (Though concretely we still see lots of debates about whether it should be there and how to slot it in the LLVM framework because it’s so different from other normal backends; there is a reason why it’s not accepted until this year.) I’m not so sure about going through LLVM for SPIR-V graphics (i.e., the
Shader capability tree in SPIR-V). It would mean we need to figure out how to restructure the control flow, how to handle various opaque types (image/sampler/etc.) and builtins through the whole stack, how to make sure various transformations don’t break graphics correctness (control flow uniformity, convergence, etc.), how to handle a different binding model so tight to runtime, etc. These are all very challenging topics and some of them have been discussed extensively for a long time in the community and I’m not sure we are in a good state till today. So for graphics I’m not sure going through LLVM is a good idea; but that’s half of what SPIR-V wants to serve. There having the extra “optimization” passes in LLVM can actually be a problem… While the current MLIR flow to target Vulkan compute shaders (not the compute as OpenCL) for ML has been working pretty well for us thus far. (BTW I remain highly interested to see more support for other graphics shader kinds and maybe a graphics frontend compiler either via ClangIR or something else to directly emit MLIR SPIR-V dialect!)
In a sense I guess the question can be extended to ask all sorts of existing IRs or alike why they aren’t implemented as LLVM backends. And we know the answer cannot be yes for all other IRs there. LLVM is great, but it’s not designed for addressing all problems out there, and that’s why we have MLIR to allow more flexibility and able to tailor to particular use cases. In the end, linking some previous discussions that are relevant and hopefully they serve as further explanations:
In defense of NIR talks about why the Mesa driver stack doesn’t just use LLVM, instead they develop their own IR. Lots of the reasons can apply to SPIR-V here.
- The original proposal to add a SPIR-V backend to LLVM and my comments there which touches a few more points.
RFC for adding HLSL and DirectX support to Clang & LLVM. I also have some comments there.
Hope the above helps.