CUDA has some complications today when we want to write to write generic __host__ __device__ code (e.g. __host__ __device__ template functions or __host__ __device__ member functions of template classes) that can be instantiated with parameters that cause the instantiation to use __host__ or __device__ only constructs.
Consider the following generic function:
template <typename Range, typename F>
__host__ __device__
void for_each(Range r, F f)
{
for (auto e : r)
f(e);
}
template <typename T>
__host__ void foo(T);
for_each(data, foo);
// Instantiates a __host__ __device__ template with a __host__ only
// function, so the instantiation is __host__ only
template <typename T>
__device__ void bar(T);
for_each(data, bar);
// Instantiates a __host__ __device__ template with a __device__ only
// function, so the instantiation is __device__
template <typename T>
__host__ __device__ void void foobar(T);
for_each(data, foobar);
// Instantiates a __host__ __device__ template with a __host__ __device__
// function, so the instantiation is __host__ __device__
With NVCC, when a __host__ __device__ template is instantiated with __host__ only or __device__ only entities, it is treated as described above, and an unnecessary warning is emitted.
There is a pragma to suppress this warning:
#pragma nv_exec_check_disable
template <typename Range, typename F>
__host__ __device__
void for_each(Range r, F f)
{
for (auto e : r)
f(e);
}
In Thrust, we decorate many __host__ __device__ template functions and __host__ __device__ member functions of template classes with this pragma.
Basically, NVCC does this:
* Check if the instantiation leads to any __host__ only or __device__ evaluations.
* If both __host__ only and __device__ only evaluations are found, it's a hard compilation error.
* If there are __host__ only evaluations, the instantiated function is __host__ only. Warning emitted.
* If there are __device__ only evaluations, the instantiate function is __device__ only. Warning emitted.
For template classes, the same checks are be performed, but only when a member function of said instantiation is actually used. This follows how template class instantiation works in general; for example, you can instantiate a std::vector with a move-only type, and call emplace_back on it, and everything will compile fine. But if you use a std::vector member that requires copyable types, such as push_back, you'd get a compilation error.
What does Clang CUDA do? Does it emit a warning for instantiations of __host__ __device__ templates with __host__ or __device__ only constructs? Is this warning useful? (I have some examples indicating it is, but I want to hear what others think)