Hi all,
Motivation
For non-offloading single-phase compilation builds, the behavior of __has_builtin
is straightforward: if the builtin can be used on the current target, language and environment, return true.
However, this gets complicated when two-phase compilation comes into play. Two phase compilation is hidden from the user as an implementation detail, and as a consequence, __has_builtin
returns true if the builtin can be used on either the host target or offloading target.
If we have something like:
void foo() {
#pragma omp target
#if __has_builtin(__builtin_ia32_pause)
__builtin_ia32_pause();
#endif
}
}
When compiled with OpenMP offloading with the offloading device being a GPU and the host being x86, __has_builtin
will always return true
and lead to the below error:
error: cannot compile this builtin function yet
This behavior reduces the usefulness of __has_builtin
.
Note that GCC also implements __has_builtin
, and the behavior is the same as Clang today. It is unclear the GCC behavior is intentional and is being discussed here.
There are also some relevant PRs: here, here, and here.
Finally, some clients rely on the current behavior of __has_builtin
such as ARM on CUDA and __cpuidex
.
Proposal
I propose we deprecate __has_builtin
and introduce a new function-like macro tentatively named __can_use_builtin
, that only returns true
if the builtin can actually be used on the current target, language and environment being used for compilation. This means for offload targets, the function may return different values for the host and offload target compiles.
In the above example, the host compile will return true
, however the target compile will return false
.
Next Steps
Assuming community consensus, first I will implement __can_use_builtin
as per the above design and document it with descriptive and relevant examples (prototype implementation available here).
Then, I will deprecate __has_builtin
and potentially remove it in a future Clang release.
Thanks,
Nick