[RFC] Is a more expressive way to represent reductions useful?

paulwalker-arm · November 14, 2023, 5:23pm

I’m reviewing [AArch64] Generate DOT instructions from matching IR by huntergr-arm · Pull Request #69583 · llvm/llvm-project · GitHub and find the need to perform block/loop analysis in order to identify a reduction to be bothersome. For the transformation to be safe the new pass wants to identify add operations that can be cross-lane reassociated. However, during vectorisation we know this is our intent so I figure we can represent this better at the IR level to maximise expressiveness. This goes beyond the PR’s DOT use case as at least for SVE there’s other instructions that are tricky to use once the reassociativity information is lost.

We could add dedicated intrinsics like:

Ty vector.reassociative.<binop>(Ty X, Ty Y): whereby X and Y are concatenated and then a binop is performed for each result lane by selecting any two input lanes once and all once.

Or:

Ty vector.shuffle.obscure(Ty X): which performs no real work other than to represent an unknown translation of input to output lanes.
{Ty, Ty} vector.shuffle.obscure(Ty X, Ty T):

I personally prefer the shuffle option because one or two intrinsics likely cover all cases but then perhaps this can be represented using instruction flags or metadata? My gut feeling is that would be dangerous in the face of CSE?

What do others think? Is this a problem worth solving? or are passes like that in the PR the best solution?

paulwalker-arm · November 15, 2023, 10:40am

Thinking about this a little more I guess for best results we might also need to incorporate some kind of partial reduction properties to represent a reduction of element count. This makes me wonder if a solution exists that doesn’t necessitate having to add code generation support for something new, which is something I’ve been trying to avoid.

paulwalker-arm · February 20, 2024, 10:55am

Update: Given the lack of feedback we (Arm) will push ahead with trying to add a dedicated intrinsic similar but not exactly as described below. The intent of the intrinsic being to represent a partial reduction to free up targets to implement as they see fit. The intrinsic will likely be vector in/vector out whose element types match whilst allowing for differing elements counts. We’ll crosslink RFCs/PRs when available.

paulwalker-arm · June 18, 2024, 10:09am

https://github.com/llvm/llvm-project/pull/94499 is an implementation of the current proposal.

Topic		Replies	Views
RFC: Generic IR reductions LLVM Dev List Archives	18	252	February 1, 2017
RFC: Generic IR reductions LLVM Dev List Archives	11	130	February 17, 2017
[RFC] Introducing a vector reduction add instruction. LLVM Dev List Archives	13	123	November 30, 2015
Status of llvm.experimental.vector.reduce.* intrinsics LLVM Dev List Archives	5	137	August 4, 2017
[RFC] Changes to llvm.experimental.vector.reduce intrinsics LLVM Dev List Archives	18	271	May 19, 2019

[RFC] Is a more expressive way to represent reductions useful?

Related topics