I didn’t understand this part – is this post PTX? The scenario I was referring to is the MLIR JIT (not AOT). There isn’t a linker step post-PTX and you need to know where libdevice is when linking prior to that AFAIU.
Looks like I described it wrong. This is how one can link it in manually, and it’s not late
to perform optimizations: https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc;l=283;drc=3381da37560d64c7cb62b53879a0a931ff9036c4
I have a workaround that does exactly this in the MLIR gpu-to-cubin
pass and then runs LLVM passes at a desired opt level post that.