[RFC] Is a more expressive way to represent reductions useful?

I’m reviewing [AArch64] Generate DOT instructions from matching IR by huntergr-arm · Pull Request #69583 · llvm/llvm-project · GitHub and find the need to perform block/loop analysis in order to identify a reduction to be bothersome. For the transformation to be safe the new pass wants to identify add operations that can be cross-lane reassociated. However, during vectorisation we know this is our intent so I figure we can represent this better at the IR level to maximise expressiveness. This goes beyond the PR’s DOT use case as at least for SVE there’s other instructions that are tricky to use once the reassociativity information is lost.

We could add dedicated intrinsics like:

Ty vector.reassociative.<binop>(Ty X, Ty Y): whereby X and Y are concatenated and then a binop is performed for each result lane by selecting any two input lanes once and all once.

Or:

Ty vector.shuffle.obscure(Ty X): which performs no real work other than to represent an unknown translation of input to output lanes.
{Ty, Ty} vector.shuffle.obscure(Ty X, Ty T):

I personally prefer the shuffle option because one or two intrinsics likely cover all cases but then perhaps this can be represented using instruction flags or metadata? My gut feeling is that would be dangerous in the face of CSE?

What do others think? Is this a problem worth solving? or are passes like that in the PR the best solution?

Thinking about this a little more I guess for best results we might also need to incorporate some kind of partial reduction properties to represent a reduction of element count. This makes me wonder if a solution exists that doesn’t necessitate having to add code generation support for something new, which is something I’ve been trying to avoid.

Update: Given the lack of feedback we (Arm) will push ahead with trying to add a dedicated intrinsic similar but not exactly as described below. The intent of the intrinsic being to represent a partial reduction to free up targets to implement as they see fit. The intrinsic will likely be vector in/vector out whose element types match whilst allowing for differing elements counts. We’ll crosslink RFCs/PRs when available.