The overall objective of this RFC is to propose a convenience function for defining vector operations on vector arguments, without needing to create a unique elementwise builtin for the specific builtin that should be vectorized.
Context:
Previously, there has been some discussion on whether or not a proposed _elementwise convenience function should be added. This would simplify the process of adding vectorized operations when the operations don’t have an intrinsic that exists. There was discussion on changing the way IR is generated (that is, should there be some way to emit a single line of IR, which represents a vectorized instruction, or is emitting N separate calls of a specific function alright?) This post discusses this point:
[RFC] Introduce elementwise apply function - IR & Optimizations - LLVM Discussion Forums
Proposal
I propose that the _elementwise attribute be implemented, and that it should return a function that will emit N calls to whichever function that it was passed. Specifically, the _elementwise attribute would receive a function name as the first argument, and return a function that will then take a set of vectorized arguments, and return a vectorized result. So, as an example, instead of needing to write this to implement an elementwise version of the pow builtin:
__attribute__((clang_builtin_alias(__builtin_elementwise_pow)))
float2 pow(float2, float2);
This could be written instead as:
__attribute__((clang_builtin_alias(_elementwise(__builtin_pow)))
float2 pow(float2, float2);
_elementwise(__builtin_pow) will return a function that can accept 2 float2s, and that function will be used as an alias. The two arguments will then be applied to this created function. It will emit 2 calls to the pow intrinsic in IR.
The function name passed to _elementwise doesn’t need to be a builtin, it could be any user-defined function. The compiler will do a name look up and use a function definition that can be resolved at that point in time. If there are multiple possible function definitions that could be resolved, I would request for comment what should be done.
The primary purpose is the need to implement a way for the user to run vectorized operations using intrinsics that don’t exist. This is the first step to allowing a user to run the tan function, for example, on a vector of arguments.