[VP] D78203: ExpandVectorPredication

Hi all,

this is a status update on LLVM-VP, the effort to bring Vector
Predication to LLVM, and a notification that the next patch for VP is
available on phabricator (D78203).
VP directly matters to you if you are interested in code generation for
SIMD or vector ISAs with mask register (ie AVX512) and/or an active
vector length (ie RISC-V V extension, NEC SX-Aurora (VE target)).
If your SIMD ISA does not have any of this, this next patch is
particularly relevant to you as it is concerned with expanding VP
intrinsics for targets that do not support those features.

:: VP Status ::
D69891 brought us vector-predicated intrinsics for integer operations in
LLVM IR (llvm.vp.add, etc).
You can read up on VP in the LangRef

:: Next patch ::
Work is now turning towards the ExpandVectorPredication pass with a
work-in-progress patch available on Phabricator
The pass folds the EVL parameter into the mask and/or lowers the VP
intrinsics to non-VP IR as required for the target.
Your early feedback on this is very welcome to steer the development in
the right direction.

:: Related Work & Some Comments ::
Matrix intrinsic lowering
([llvm-dev] Execute intrinsic lowering passes on demand).
With the VP and matrix intrinsic passes, we are about to add two new
lowering/expansion passes to code generation.
A quick glimpse at TargetPassConfig.cpp reveals that this brings us to
at least six intrinsic lowering passes in total (LowerConstantIntrinsic,
ScalarizeMaskedMemIntrin, ExpandReductions, ExpandMemCmp + 2). That
means at least six times, we are iterating over all instructions in the
module to put them on an expansion worklist. If this becomes a
performance bottleneck, we should re-think about how intrinsic expansion
is being done, eg by adding attributes as Florian suggested, or by
having one intrinsic index (analysis) that speeds up the worklist
population for all of these passes.

Masked reductions
([llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics).
There is an interest in masked reduction operators and moving the
reduction operators out of the "llvm.experimental" prefix.
I suggest we implement those intrinsics in the llvm.vp.reduce.*
namespace (similar to how its done in the VP reference patch) and have
them support a mask and EVL.

The EVL parameter needs to be folded into the mask for the scalable
targets ARM MVE/SVE, which do not have EVL in hardware.
However i did not find a target-agnostic IR pattern to express
"EVLMask[i] = i < %evl" in the scalable setting.