I'm starting to think we should directly implement horizontal operations on vector types.
Inputs from our experience:
Anything that can be used in reduction operator should have such support, as well as the vector of booleans resulting from the comparison of vector values. This includes MIN/MAX which is currently represented as compare/select pair in the IR.
In general, the best code sequence to perform horizontal operation depends on the target micro-architecture (i.e., optimal code sequence may be different even within the same ISA). As such, there is a merit in keeping horizontal operations as they are until the compiler is ready to perform micro-architectural optimization. For the targets that do not have such characteristics, generic lowering may be sufficient.
It is also good to have FP-value-safe variant of horizontal operation, such that it can be used
w/o fastmath flag. It would be certainly slower, but there are enough people who consider
bitwise identity of FP computation more important than a bit of speed impact.
Intel Compiler and Languages