Hi @csigg, thank you!
- There are a couple of ways to specify different paths for the CUDA toolkit at runtime, for example, the NVVM target will query the following environment variables:
CUDA_ROOT
CUDA_HOME
CUDA_PATH
and if any of those are non empty, then the compilation mechanism will use that path for searching for the tools. You can also use --gpu-module-to-binary=toolkit=/path/to/toolkit to specify the path, that one always takes precedence.
- Ok, there are multiple answers to this question. If you use
--gpu-module-to-binary=format=bin, then the target and the GPU must be an exact match, if you use--gpu-module-to-binary=format=fatbin, then the NVVM target produces a fatbin with the PTX embedded -this option is not available in the nvptx-compiler lib, so the driver should be able to JIT the code if thereās an arch mismatch, thatās how I got working the tests irregardless of the platform. The default behavior is trying to produce the fatbin.
Please let me know if 1 works for you and if it solves the needs of your setup.
Yes, Iām working on a fully JIT version, it should be in trunk in the next few days. Also, I was planning to push today the deprecation of the old passes do you feel that you need more time?