[RFC] Extending MLIR GPU device codegen pipeline

Please note that in the context of the sparsification, I also consolidated some of the passes required to get GPU code running into a single pipeline. The setup is simple but functional, and provides a path for direct CUDA code generation as well as conversion into cuSPARSE calls (I linked to the command line of two end-to-end examples to illustrate both pipeline set ups). Here too, however I often found finding the exact lowering passes a bit brittle and I would love to see how you further enhance the setup!

1 Like