my research project is to optimize CUDA program via LLVM by utilizing annotated ptr.
It would help if you were a bit more specific about the details of what you’re trying to do and how exactly it does not work. Providing a reproducer on cuda.godbolt.org is extremely helpful if you want someone’s help with the compiler.
This is not sufficient for me to give you any specific suggestion. Details are important – command line options, which CUDA version you’re using, which clang version did produce the error, did you use the headers from CUDA itself, or did you use the upstream sources, etc.
In general, poking at the upstream sources of NVIDIA’s libcudacxx it appears that it relies on some compiler builtins that clang has not implemented yet. E.g. __nv_associate_access_property. Those would have to be implemented first. Until them you may need to provide your own implementation as a function with inline asm that would do whatever that function is supposed to do. Unfortunately NVIDIA does not document it, so I currently have no clue what exactly this builtin is supposed to do. You would need to experiment and check what NVCC produces for that function. It’s possible that __nv_associate_access_property
is what they use to give NVCC hints about intended use of the pointers. If that’s the case, it may be possible to provide no-op implementations early on so you can get the code to compile.
Once the builtins are available, you will need to deal with portability issues in libcudacxx
. I’m willing to bet that being compileable with clang was not high on the list of author’s priorities. Someone would need to find and fix the issues in a way that works for both clang and NVCC and upstream them to libcudacxx.
Once that is done, then you would be able to see what clang ends up generating for the code using the libcudacxx headers.