#ifdef __HIP_DEVICE_COMPILE__
#define HD __attribute__((host)) __attribute__((device))
#else
#define HD
#endif
HD void foo(){
}
__attribute__((global)) void kernel(){
foo();
}
Clang report an error about global function when compiling for host target. The device code has already been generated. It does not make any sense to make this error.
#include <stdio.h>
#ifdef __CUDA_ARCH__
#define HD __host__ __device__
#else
#define HD
#endif
HD void foo() {
printf("execute foo\n");
}
__global__ void kernel(){
foo();
}
int main() {
kernel<<<1,1>>>();
cudaDeviceSynchronize();
}
If you compile the CUDA code using clang, you’ll observe the same results as with HIP. This is because clang builds a complete AST during its two-pass compilation process from the same source file.
My understanding of nvcc is that it splits the source code before compilation, with the split parts being fed separately to different compilers (for host and device).
We just don’t really have a mechanism for doing so or identifying it. You can’t really skip functions effectively, as finding just the close ‘}’ can be challenging, so we don’t bother. 2-pass compilation has a problem with this, and it is a downside to the architecture.