How does the tensorflow-mlir library work, to reduce tf code to run on GPU?

Is there a specific dialect which is used to reduce the TensorFlow to CuDNN/ MKL-DNN ?. Also is it first reduced to the gpu dialect ( and then converted to either a cudnn/mkl-dnn code ?.


We do not have support in the GPU dialect to call what essentially is an external function. The modelling challenge is that the GPU dialect does not have a notion of stream but that is required by the external functions once lowered to CUDA for example. So we would need some meta-data annotation that informs the lowering to a stream based runtime of what the expected arguments of that external function are.

This can all be done but has not been tackled yet.

Independently, one can also do this at a different level of abstraction. Instead of going via the GPU dialect to an explicit cuDNN call, one could wrap cuDNN into BEF kernels and lower to those. This is being considered in the context of the asycn.region proposal and TFRT.