Proposal
Math library functions with output pointers should be represented in LLVM as intrinsics that return structures. These intrinsics could then be emitted by clang when -fno-math-errno is set.
When emitting these intrinsics clang would insert explicit stores for the output pointer results.
The initial candidates for this would be:
void sincos(T val, T* sin_out, T* cos_out)- Becomes:
{ T, T } @llvm.sincos.*(T %val)
- Becomes:
void sincospi(T val, T* sin_out, T* cos_out)- Becomes:
{ T, T } @llvm.sincospi.*(T %val)
- Becomes:
T modf(T val, T* int_part_out)- Becomes:
{ T, T } @llvm.modf.*(T %val)
- Becomes:
Note: The implementation of each of these intrinsics would likely be similar to the recently added llvm.frexp.* intrinsic (patch) and its associated clang builtin.
Motivation
Vectorization
Currently, sincos and sincospi can be vectorized (with -fveclib=ArmPL/sleefgnuabi), but aliasing issues can be introduced as LoopAccessAnalysis does not track the pointer operands and assumes vectorizing library calls is safe.
Rather than update LoopAccessAnalysis to track pointer operands, modeling the out pointers with explicit stores in the IR would allow this analysis to work unchanged, solving the aliasing issues.
The vectorization would not be free as the vectorizer would need to be updated to handle widening calls with struct results, but this is likely to be useful for more complex types in future.
Note: This would also mean disallowing vectorizing library calls with in/out pointers, and likely require libraries to provide _stret variants.
New canonicalizations
When safe llvm.sin and llvm.cos intrinsics could be combined into a single call to llvm.sincos. On targets that support sincos this could provide some performance uplift.
This could also be done for sincospi (though currently there are no sinpi or cospi intrinsics).
Struct returns
Targets that implement struct-returning variants of these functions could lower to those directly, allowing the memory for the results to be elided.
Existing implementations (canonicalizations)
Both the merging of sin + cos to sincos and lowering to _stret (of sincos[pi]) variants exist today, but it is done within the SelectionDAG.
The main difference here is this could be brought up to the IR level.
Questions
Are struct returns the best path for avoiding the aliasing issues when vectorizing library functions with output pointer parameters?
Should llvm.sincos be the canonical form of llvm.sin + llvm.cos of the same value?