Hi Lou,
Thanks a lot for the insights.
The way I implemented more advanced features (if-conversion, tail-loop-folding, live-out values like reductions, …) that require more Analysis (not part of this RFC as those would be much more invasive changes) was mostly by extending the
LoopVectorizationLegality(e.g. using DependenceInfo). The checks on memory accesses etc. are all done before I do the firstVPlanTransformthat modifies the CFG or invalidates IR-based analysis results (like if-conversion), so I just use the analysis results I got directly on the original LLVM-IR.
Oh, this is very interesting because I was looking at the “journey” through VPlan transformations as “views” over the LLVM IR (so we can avoid modifying the LLVM IR until the last moment when we execute the chosen plan).
But your comment makes me realise this approach may not be entirely workable as transforms might change the plan in a way that keeping the relationship to the original connection is difficult or not practical.
It is also good to see you didn’t need that, though, for later transformations. Thanks again for sharing your experience.
I guess by that you mean using the
@llvm.vp.*Intrinsics which take a mask and a explicit vector length parameter? I thought about that, and my approach would have been to add a optional mask (and EVL) operand to theVPWidenRecipe(and other similar ones). Then, in theexecute()methods of the optionally masked widening recipes, the vectorizer could generate@llvm.vp.faddinstead offaddand so on. This would be independent of the VPlan-native path, theexecute()methods are 100% shared. This is just an idea though, I have no PoC and I did not think about the EVL parameter yet.
Indeed this is the ultimate goal. In our fork we added a bunch of recipes with explicit vector length. Our approach works but it may be a bit too invasive. In that sense I believe @alexey-bataev 's approach in https://reviews.llvm.org/D99750 is more sustainable in the long term.
Kind regards,
Roger