A large amount of effort has been spent supporting every VP intrinsic in the RISC-V backend. However for a large portion of these, nothing in upstream LLVM uses them today and it seems unlikely they will be used in future.
This RFC proposes to remove support for these unused VP intrinsics in the RISC-V backend, both to simplify the backend and also to ensure that development time isn’t misspent working on them.
This does not propose to remove the intrinsics themselves, but it does open up the possibility of doing so which is deferred to another RFC.
Background
VP intrinsic support was added early on during the development of RVV support, and my understanding is it was largely to used to control vl from the loop vectorizer via the EVL argument.
There’s two reasons main why we want to control vl:
- To prevent trapping on unused lanes for loads/stores/divides and correctly perform permutations like reverses on dynamic vector lengths, reductions etc.
- To improve performance on some microarchitectures by not executing on lanes that aren’t needed, and avoid
vsetvlitoggles in general.
So the loop vectorizer used to convert every vectorized instruction to a VP intrinsic on RISC-V[1] for both the correctness and performance reasons above.
However last year the RISCVVLOptimizer pass was upstreamed, which takes care of the performance aspect by optimizing vl to only what’s demanded at the MIR level. Because it operates after instruction selection, it means that regular non-VP instructions also end up having optimized VLs.
So this meant that we could just emit regular instructions instead of VP intrinsics in a lot of cases, which avoided the issue of how most of InstCombine/InstSimplifty/DAGCombine isn’t yet lifted to work on VP intrinsics, and improved codegen significantly.
Proposal
The subset of intrinsics that we were able to swap out for instructions are the ones that are only needed for performance, e.g. llvm.vp.add. These only replace disabled lanes with poison, so it’s correct to just replace it with e.g. a regular add instruction. These intrinsics are what this RFC refers to as “trivial” VP intrinsics, since they fall into the same category as those in llvm::isTriviallyVectorizable.
Now that the loop vectorizer no longer emits these, there are no other users of these intrinsics now within LLVM upstream to the best of my knowledge. So it’s possible to remove codegen support for these in the RISC-V backend without affecting e.g. Clang or Flang.
This means it’s possible to remove a significant amount of VP related code in RISCVISelLowering.cpp, and reduce the number of different ways we have of expressing vector semantics from 3 to 2 (the others being regular LLVM IR and RVV intrinsics).
It also allows us to massively reduce the number of tests in llvm/test/CodeGen/RISCV/rvv. For every scalable and fixed vector test case, we have a corresponding -vp.ll test, 168 in total. A quick wc -l shows that these tests alone are 215k lines, and most of these are for trivial intrinsics.
Asides from the code cleanups, the other main major benefit is that it would clarify the general direction of the RISC-V backend to avoid trivial VP intrinsics and prevent any redundant work on trying to improve support for them. There’s been a few PRs in the last year in this area, and the effort is probably better directed elsewhere[2]
Intrinsics considered trivial
This would be every VP intrinsic bar the following:
llvm.vp.{load,store,gather,scatter,strided.load,strided.store,load.ff}llvm.vp.merge: this has special semantics where the lanes past EVL aren’t poisonllvm.vp.{udiv,sdiv,urem,srem}: these mask off UB in disabled lanesllvm.vp.reduce.*andllvm.vp.cttz.elts: disabled lanes affect the computed resultllvm.vp.{splice,splat,reverse}: these are permutations which aren’t easily expressible with shuffle vectors (splat is probably removable, but it’s not trivial)
Potential VP intrinsic users
The loop vectorizer and SLP vectorizer continue to use some of the memory and permutation VP intrinsics needed for correctness, but these are not considered trivial intrinsics.
LoopIdiomVectorize.cpp uses a vp.icmp, but that can be replaced with a regular icmp.
The only public out-of-tree user that I’m aware of that uses VP intrinsics is the region vectorizer. But in theory it should be able to follow the loop vectorizer and just emit regular instructions without any detriment to code quality.
Any language frontend that specifically wants to control vl and masking should just use the RVV intrinsics instead. I searched around and couldn’t find anything that was already using VP intrinsics though. And given that the only targets that implement VP intrinsics are RISC-V and VE, it seems unlikely that a frontend would be relying on it in a target agnostic way.
From the MLIR side I’m only aware of this one thread mentioning VP intrinsics, but I’m not sure if anything ever made it upstream.
I’m aware that some downstream LLVM forks use VP intrinsics more heavily. I would like to hear their opinion on this. Hopefully if the RISCVVLOptimizer is also enabled downstream then this change won’t be disruptive.
Alternatives considered
We could just leave the existing codegen support in and not develop it any further. It’s not a major maintenance burden, but the main concern would be that it gives more time for other users to start using and relying on these intrinsics.
Future work
The current future of VP in LLVM is somewhat uncertain. As of 2025 we are still in the “lift InstCombine” stage of the roadmap, and there hasn’t been any recent progress on this.
Currently RISC-V and VE are the only targets that support VP intrinsics. If this RFC goes ahead, then VE will be the only target to implement the trivial VP intrinsics.
It’s likely that VE could also add something similar to RISC-V’s RISCVVLOptimizer, potentially sharing some of the analysis infrastructure and even extending it to propagate masks[3].
By splitting the concept of predication into what is needed for correctness and what is needed for performance, and delegating the latter to a late MIR pass, we could remove trivial VP intrinsics altogether. This would significantly reduce the scope of VP intrinsics to those that are only needed for correctness.
With EVL tail folding, which at the time wasn’t enabled by default. As of today it is now enabled by default. ↩︎
[LegalizeTypes][VP] Teach isVPBinaryOp to recognize vp.sadd/saddu/ssub/ssubu.sat by tclin914 · Pull Request #154047 · llvm/llvm-project · GitHub https://github.com/llvm/llvm-project/pull/125991 [RISCV][TTI] Implement cost for vp min/max intrinsics by arcbbb · Pull Request #107567 · llvm/llvm-project · GitHub [RISCV] Lower VP_SELECT constant false to use vmerge.vxm/vmerge.vim by ChunyuLiao · Pull Request #144461 · llvm/llvm-project · GitHub https://github.com/llvm/llvm-project/pull/133245 https://github.com/llvm/llvm-project/pull/132345 ↩︎
It’s worth noting that this idea of computing demanded elements and propagating the information backwards has been explored before https://llvm.org/devmtg/2023-05/slides/Posters/01-Albano-VectorPredictionPoster.pdf ↩︎