How to lower LLVM vector math intrinsics to symbol calls ?

From a language frontend that generates LLVM-IR containing vector math intrinsics, how can I tell LLVM to lower those to specific symbols ?

For example, the intrinsic llvm.sqrt.f32 gets lowered to _sqrtf, but the intrinsics llvm.sqrt.v4f32 and llvm.sqrt.v8f32 get lowered to 4 and 8 calls to _sqrtf, respectively.

Instead, I’d like to provide a symbol for llvm.sqrt.v4f32, e.g., _sqrtf4 (but it could be a libmvec, SVML, Sleef, or some other symbol), such that LLVM lowers llvm.sqrtv4f32 to _sqrtf4, and such that llvm.sqrt.v8f32 gets lowered to 2x _sqrtf4 calls, if and only if, my frontend does not provide a symbol for it.

For context, the Rust frontend exposes portable packed SIMD vectors that users can manipulate directly, e.g., calling sqrt on a v8f32. We used to lower these to llvm.sqrt.v8f32 directly, but performance was horrible because that lowered to 8 _sqrtf calls. We are now working around this by not lowering them to llvm.sqrt.x anymore, calling an unknown function instead. This performs quiet good, but we lose many LLVM optimizations due to the unknown function call, e.g., while llvm.sqrt.v8f32(<8 x float 0.0>) could be constant-folded to <8 x float 0.0>, this is not the case anymore due to the unknown function call.

I kind of assumed that there must be a way for frontends too hook in symbols to which to lower the intrinsics, but it appears that all symbol lowering is hardcoded in LowerIntrinsic. For example, if instead of emitting _memcpy for llvm.memcpy I wanted that intrinsic to call a symbol _foo instead, is that possible ?

What would be the simplest way to achieve this ? Could I insert my own LowerIntrinsic pass that runs before the LLVM one, doing the lowering that I want?