Generate separated cubin from gpu dialect

Hi folks! I am learning mlir and trying to lower gpu dialect to cubin.

From my understanding, the gpu dialect contains both host ir and device ir, and the generated cubin is embedded in the host ir as an annotation.

I was wondering that if it is possible to generate separated cubin from gpu dialect as a static lib for being called by other cpp program.

Thank you!

This is of course theoretically possible, but there is not support for this in-tree right now as far as I know, you’d have to build this.

1 Like

Appreciate your prompt reply! Really helpful!

If the source IR contains a gpu.module representing kernels only, lowering that will give you a module with an attribute containing the cubin. From there, it should be straightforward to take the contents of the attribute and write it to a file.

1 Like