[RFC] Introduce cross-lane function attribute to prevent merging call of cross-lane functions

Recently by investigating an issue reported for HIP (Question: Do warp cross-lane functions work in branching code at all? · Issue #2474 · ROCm-Developer-Tools/HIP · GitHub) we found there is an issue with SimpifyCFG with cross-lane functions.

SimplifyCFG assumes if(x) y=f(a) else y=f(b) is equivalent to y=f(x?a:b) and does this transformation when such a pattern is found. This is generally true for non-cross-lane functions but not true for cross-lane functions.

In GPU, multiple threads (lanes) are executed lock-step as a wavefront. Most functions do not depend on values from other lanes. Cross-lane functions depend on values from other lanes. E.g, __any(x) returns true if x from any active lane is true.

Consider if(x) y=f(a) else y=f(b). Let’s assume before executing this statement, all lanes are active and x has different values for different lanes. Some lanes will execute y=f(a) and other lanes with execute y=f(b). In either cases, f is executed with partially active lanes.

If the statement is transformed to ‘y=f(x?a:b)’. Then f is executed with all lanes active. Then the result is different from when f is executed with partially active lanes.

To fix this issue, I suggest to introduce a function attribute ‘cross-lane’ to mark cross-lane functions and intrinsic, and prevent SimplifyCFG to merge the calls of such functions.

Any comments are welcome. Thanks.

How is this problem different from the problems addressed by the convergent attribute?

It seems like there is a lot of prior art here that is a lot further along, and it would be better to pursue those instead of adding new features:
https://reviews.llvm.org/D68994

https://lists.llvm.org/pipermail/llvm-dev/2019-October/135929.html

https://reviews.llvm.org/D69498

1 Like

I agree with Reid, this is exactly the problem that convergent should address, and that transform in SimplifyCFG should not be applied when a call is convergent.

There’s a technical issue in that the current definition of convergent in LangRef does allow the SimplifyCFG transform. It’s high time to revive that proposal from two and a half years ago.