I’m reviewing [AArch64] Generate DOT instructions from matching IR by huntergr-arm · Pull Request #69583 · llvm/llvm-project · GitHub and find the need to perform block/loop analysis in order to identify a reduction to be bothersome. For the transformation to be safe the new pass wants to identify add
operations that can be cross-lane reassociated. However, during vectorisation we know this is our intent so I figure we can represent this better at the IR level to maximise expressiveness. This goes beyond the PR’s DOT use case as at least for SVE there’s other instructions that are tricky to use once the reassociativity information is lost.
We could add dedicated intrinsics like:
Ty vector.reassociative.<binop>(Ty X, Ty Y)
: whereby X and Y are concatenated and then a binop is performed for each result lane by selecting any two input lanes once and all once.
Or:
Ty vector.shuffle.obscure(Ty X)
: which performs no real work other than to represent an unknown translation of input to output lanes.
{Ty, Ty} vector.shuffle.obscure(Ty X, Ty T)
:
I personally prefer the shuffle option because one or two intrinsics likely cover all cases but then perhaps this can be represented using instruction flags or metadata? My gut feeling is that would be dangerous in the face of CSE?
What do others think? Is this a problem worth solving? or are passes like that in the PR the best solution?