From a language frontend that generates LLVM-IR containing vector math intrinsics, how can I tell LLVM to lower those to specific symbols ?
For example, the intrinsic llvm.sqrt.f32
gets lowered to _sqrtf
, but the intrinsics llvm.sqrt.v4f32
and llvm.sqrt.v8f32
get lowered to 4 and 8 calls to _sqrtf
, respectively.
Instead, I’d like to provide a symbol for llvm.sqrt.v4f32
, e.g., _sqrtf4
(but it could be a libmvec, SVML, Sleef, or some other symbol), such that LLVM lowers llvm.sqrtv4f32
to _sqrtf4
, and such that llvm.sqrt.v8f32
gets lowered to 2x _sqrtf4
calls, if and only if, my frontend does not provide a symbol for it.
For context, the Rust frontend exposes portable packed SIMD vectors that users can manipulate directly, e.g., calling sqrt on a v8f32. We used to lower these to llvm.sqrt.v8f32
directly, but performance was horrible because that lowered to 8 _sqrtf
calls. We are now working around this by not lowering them to llvm.sqrt.x
anymore, calling an unknown function instead. This performs quiet good, but we lose many LLVM optimizations due to the unknown function call, e.g., while llvm.sqrt.v8f32(<8 x float 0.0>)
could be constant-folded to <8 x float 0.0>
, this is not the case anymore due to the unknown function call.
I kind of assumed that there must be a way for frontends too hook in symbols to which to lower the intrinsics, but it appears that all symbol lowering is hardcoded in LowerIntrinsic. For example, if instead of emitting _memcpy
for llvm.memcpy
I wanted that intrinsic to call a symbol _foo
instead, is that possible ?
What would be the simplest way to achieve this ? Could I insert my own LowerIntrinsic
pass that runs before the LLVM one, doing the lowering that I want?