Hello all,

A patch is created in Phabricator to demonstrate: ⚙ D150120 [RFC] Vector math function loop idiom recognition.

This patch extends the loop idiom recognize pass by recognizing a scalar math function call in the loop and transforming it to a vector math function call.

For example, to transform from

```
for (int i = 0; i < 100; i++)
y[i] = exp(x[i]); // scalar math
```

to

```
vexp(y, x, 100); // vector math
```

A vector math function computes the same mathematical function for a vector of operands stored in contiguous memory. By design, the average computation time per compute element in vector math function is less than that in the equivalent scalar math function when the number of compute elements is greater than a threshold value, resulting in better performance overall. There are a number of math libraries supporting vector math functions, including IBM MASS library, Intel Math Kernel Library, etc. As an example, the vector math functions from IBM MASS library are approximately 30 times faster per compute element on geometric mean than the scalar equivalent on IBM Power10, measured by computing 1000 elements with valid but random input values.

The threshold value is dependent on the specific math function and library. The values may be generated from a heuristic, and then provided to the pass in a table from TargetLibraryInfo.

The motivation of this patch is to achieve this performance benefit for various math libraries on different targets. Hence, the design is to transform the idioms to a new set of LLVM intrinsics for vector math functions, which is designed to be general for all math libraries on all targets. Then a new pass added in the back-end will lower the intrinsics to the actual vector math functions on its target. To demonstrate, a new pass is added in the PowerPC target to lower the intrinsics to the MASS library vector math functions in this patch.

In a more complex loop, preparation passes may be required before this patch can recognize the idiom. For example, when there are data dependencies on the input or output of the scalar math function in the loop, loop distribution may be required to split the dependencies into separate loops.

As a demonstration, currently this patch accepts the threshold value for profitability from a command line option, and only evaluates it in loops with known trip count at compile time. For loops with unknown trip count at compile time, one solution is to version the loop and insert condition checking code to evaluate at run time.

Regarding adding the new vector math functions, another idea is to expand the existing VecFunc.def to include them, instead of adding a new VectorMathFunc.def shown in this patch. With that, the VecDesc structure in TLI may be expanded and changed as below.

```
struct VecDesc {
StringRef ScalarFnName; // scalar math function (e.g. exp)
StringRef SIMDFnName; // rename to SIMD math function (e.g. expd2)
ElementCount VectorizationFactor; // vectorization factor for the SIMD math function
StringRef VectorFnName; // new vector math function (e.g. vexp)
Intrinsic::ID IntrinID; // new LLVM intrinsic for vector math function
}
```

The new name “SIMD” replaces the old name “vector” to represent the type of math functions which takes a vector data type as the parameter and return type, such as expd2. And the name “vector” is used to represent the new type of math functions that are introduced in this patch, such as vexp. The names and definitions of the query functions such as isFunctionVectorizable will also need to be changed accordingly. This approach may help better distinguishing these two types of math functions in LLVM.

Any comments would be appreciated.