A convergent operation involves inter-thread communication or synchronization that occurs outside of the memory model, where the set of threads which participate in communication is implicitly affected by control flow.
In structured programming languages, there is often an intuitive and unambiguous way of determining the threads that are expected to communicate. However, this is not always the case even in structured programming languages, and the intuition breaks down entirely in unstructured control flow. This RFC introduces a formal semantics in LLVM to determine the set of communicating threads for convergent operations.
This is a replacement for the existing convergent attribute in LLVM, which is unable to clearly express the semantics of convergent operations.
Thank you for taking on this work! It’s been a long time coming
I do believe this is the right way forward for GPU cross-lane operations where the set of communicating lanes is implicit. Unsurprising since you took what I started, but I think you made some important improvements along the way.
Since this is a topic that is mostly interesting to GPU folks: there’s a GPU Working Group meeting scheduled for next Friday, perhaps you can attend and put it on the agenda to allow for an overview / sort of Q&A, depending on what folks are interested in?
Yeah we should certainly discuss this in the working group meeting. Which “next Friday” is this? I am not available on 31st March or 7th April. I will definitely want to present in the next available meeting.
The RFC was updated and simplified in response to some questions by @jdoerfert and @jsilvanus . We believe it’s in a good shape now, and would like to submit it on Monday, June 26th, unless there are more comments by then.
And just in case, "questions by @jdoerfert " refers to the last GPU working group meeting where we had presented the RFC. Johannes was interested in meaty motivational examples, because the loop examples we gave could be explained without tokens too. So the spec is now updated to bring out the motivation for having explicit tokens.
I am a little worried nobody that was not involved in the design (looking at @nhaehnle), accepted it yet. We can rubber stamp it now, but I would hope people would fine the time in the next 2 weeks to actually look, or look again. I will try, whatever that means.
Thanks! I am sure we can wait two more weeks with the hope that things will move forward. FWIW, the original RFC in D85603 had received a lot of discussion. This new RFC only clarifies and simplifies some things, so most of the original discussion is still relevant. The impression created back then was that people understood what is being introduced here and why, but that RFC never reached a conclusion.
The first change, D147116 is now submitted. There always room for discussion, and we are eager to receive feedback about the proposed experimental intrinsics and their semantics.