I was recently looking into the translation of LLVM-IR vector instructions to ARM NEON assembly. Specifically, when this is legal to do and when we need to be careful.
I attached a very simple test case:
define <4 x float> @fooP(<4 x float> %A, <4 x float> %B)
%C = fmul <4 x float> %A, %B
ret <4 x float> %C
If fooP is compiled with “llc -march=arm -mattr=+vfp3,+neon” LLVM happily uses ARM NEON instructions to implement the vector multiply. This is obviously the fastest code that we can generate, but on the other hand we loose precision compared to non-NEON code (NEON flushes denormals to zero).
As LLVM has now support for IR level fast-math flags, I am wondering if it now would make sense to only create NEON instructions if the relevant fast math flags are set on the IR level?
The reason behind my question is that at the moment the only way to get IEEE 754 floating point operations on ARM is to fully disable NEON. However, NEON can be safely used for integer computations as well as for LLVM-IR instructions with the appropriate fast math flags. The attached test case contains an example of a floating point operation that requires IEEE 754 compliance, a floating point operation that does not require IEEE 754 as well as an integer computation. It is a perfect mixed use case, where we really do not want to globally disable NEON.
I understand that some users do not require 754 compliant floating point behavior (clang on darwin?), which means they would probably not need this change. However, it should also not hurt them performance-wise as such users would probably set the relevant global fast-math flags to reduce the precision requirements, such that NEON instructions would be chosen anyway.
I am very interested in opinions on the general topic as well as how to actually implement this in the ARM target.
All the best,
neon-floating-point-precision.ll (1.16 KB)