[RFC] _elementwise attribute in clang front end

The overall objective of this RFC is to propose a convenience function for defining vector operations on vector arguments, without needing to create a unique elementwise builtin for the specific builtin that should be vectorized.

Context:
Previously, there has been some discussion on whether or not a proposed _elementwise convenience function should be added. This would simplify the process of adding vectorized operations when the operations don’t have an intrinsic that exists. There was discussion on changing the way IR is generated (that is, should there be some way to emit a single line of IR, which represents a vectorized instruction, or is emitting N separate calls of a specific function alright?) This post discusses this point:
[RFC] Introduce elementwise apply function - IR & Optimizations - LLVM Discussion Forums

Proposal
I propose that the _elementwise attribute be implemented, and that it should return a function that will emit N calls to whichever function that it was passed. Specifically, the _elementwise attribute would receive a function name as the first argument, and return a function that will then take a set of vectorized arguments, and return a vectorized result. So, as an example, instead of needing to write this to implement an elementwise version of the pow builtin:

__attribute__((clang_builtin_alias(__builtin_elementwise_pow)))
float2 pow(float2, float2);

This could be written instead as:

__attribute__((clang_builtin_alias(_elementwise(__builtin_pow)))
float2 pow(float2, float2);

_elementwise(__builtin_pow) will return a function that can accept 2 float2s, and that function will be used as an alias. The two arguments will then be applied to this created function. It will emit 2 calls to the pow intrinsic in IR.

The function name passed to _elementwise doesn’t need to be a builtin, it could be any user-defined function. The compiler will do a name look up and use a function definition that can be resolved at that point in time. If there are multiple possible function definitions that could be resolved, I would request for comment what should be done.

The primary purpose is the need to implement a way for the user to run vectorized operations using intrinsics that don’t exist. This is the first step to allowing a user to run the tan function, for example, on a vector of arguments.

1 Like

CC @erichkeane @arsenm @jcranmer for more opinions

Thank you, I think this is a really good goal!

I’m assuming we’ll have diagnostics for cases where the declaration makes no sense (e.g., there are different vector element sizes like float2, float4, or the function signature and the builtin signature don’t relate to one another, etc)?

Btw, I’m not keen on the name _elementwise because of the leading underscore. Would elementwise perhaps be better?

I think we could either diagnose as an ambiguous lookup or we could try to do overload resolution by looking at the arguments to the attributed function and trying to pick the best candidate from the overload set. My intuition is that we probably should diagnose as ambiguous until we have concrete use cases.

1 Like

@AaronBallman IIRC there was a person who showed up in during your office hours that was working on elementwise stuff for HLSL.

I’m sorry, that might be the author of RFC himself.

Yeah, that was definitely me!

Yes, to put it in other words, given:

__attribute__((clang_builtin_alias(_elementwise(X)))
<ret val> Y(<args>);

there should be diagnostics if:

  • X is not resolvable to any function definition.
  • Number of arguments to Y mismatch all possible function definitions found with X (aka, function definition resolution is impossible)
  • Multiple valid function definitions are resolvable given X, that could take the arguments of Y. (aka, >1 valid alias-able function definition resolutions).
  • Y has more than one type of argument types.

I wouldn’t be opposed to calling it elementiwse instead of _elementwise, though given that it’s a clang feature, like the __builtins*, I would’ve expected underscores to lead. Wouldn’t “__elementwise” be appropriate?

Declaration, also in the other points.

Restrictive is good for now, later we might need more nuanced rules, e.g., allow scalar arguments in Y and best match, etc. We can get to that when we need it.

+1 for elementwise w/o _.

If Y is being disallowed when it has more than one type of argument, I think that cuts out some of the functions we might want an elementwise implementation of (like remquo or scalbn) but I think that’s reasonable enough; those can always do things the hard way.

__elementwise would be appropriate if we think users may have a macro named elementwise somewhere that would conflict with the use in the attribute. Because this is an implementation detail that will show up in a lot of header files, I’d be fine with __elementwise so that it’s using a reserved identifier and we don’t have to work around conflicts. Another approach would be to introduce a second attribute clang_elementwise_builtin_alias or some such (we accept attribute names with leading and trailing double underscores specifically so that they’re easier to use in system headers). I don’t currently have a strong opinion on new attribute vs uglier argument name.

1 Like

I didn’t consider clang_elementwise_builtin_alias; I think that’s a much cleaner approach, easier to read. I’d prefer that over creating another middle man.

2 Likes

Yeah, I usually prefer extending existing attributes to adding new ones, but this feels like a case where we may want a new attribute instead. It’s just different enough from the usual builtin alias attribute that it seems warranted to have a second attribute. CC @erichkeane to see if he has other opinions as attributes code owner.

I am confused. Long term it should be a better solution to add a tan intrinsic and a MathExpandPass for targets that do not support some math functions.

This solution isn’t excluding making a tan intrinsic.
It’s just saying whenever a tan intrinsic is produced, we can then do this:

__attribute__((clang_elementwise_builtin_alias(__builtin_tan)))
float2 tan(float2, float2);

Instead of having to create an elementwise tan builtin, based on the tan intrinsic.

But when you have a tan intrinsic, you can just add a Clang builtin for tan. There is no need for attributes.

Adding new intrinsics is not a very generic solution. Intrinsics are really not a good choice for library functions. The above works for any function, incl. user functions. If we need IR representation, we can also revisit that, though we already have metadata for the vectorizer to provide vector versions of functions, IIRC.

But the builtin wouldn’t be able to take a vector of arguments, it would only take one argument right? The purpose of the attributes is to allow elementwise application of the builtin.