[RFC] Introduce convergence control intrinsics

D147116 is a reboot of an earlier RFC for introducing explicit convergence control in LLVM IR: D85603. This was originally discussed in llvm-dev: [RFC] Introducing convergence control bundles and intrinsics

Here’s a quick (re)introduction:

A convergent operation involves inter-thread communication or synchronization that occurs outside of the memory model, where the set of threads which participate in communication is implicitly affected by control flow.

In structured programming languages, there is often an intuitive and unambiguous way of determining the threads that are expected to communicate. However, this is not always the case even in structured programming languages, and the intuition breaks down entirely in unstructured control flow. This RFC introduces a formal semantics in LLVM to determine the set of communicating threads for convergent operations.

This is a replacement for the existing convergent attribute in LLVM, which is unable to clearly express the semantics of convergent operations.

Changes relative to ⚙ D85603 IR: Add convergence control operand bundle and intrinsics

  1. Clean up the definition of a “convergent operation”, a convergent call and convergent function.
  2. Clean up the relationship between dynamic instances, sets of threads and convergence tokens.
  3. Redistribute the formal rules into the definitions of the convergence intrinsics.
  4. Expand on the semantics of entering a function from outside LLVM, and the environment-defined outcome of the entry intrinsic.
  5. Replace the term “cycle” with “closed path”. The static rules are defined in terms of closed paths, and then a relation is established with cycles.
  6. Specify that if a function contains a controlled convergent operation, then all convergent operations in that function must be controlled.
  7. Describe an optional procedure to infer tokens for uncontrolled convergent operations.
  8. Introduce controlled maximal convergence-before and controlled m-converged property as an update to the original properties in UniformityAnalysis.

Thank you for taking on this work! It’s been a long time coming :slight_smile:

I do believe this is the right way forward for GPU cross-lane operations where the set of communicating lanes is implicit. Unsurprising since you took what I started, but I think you made some important improvements along the way.

Since this is a topic that is mostly interesting to GPU folks: there’s a GPU Working Group meeting scheduled for next Friday, perhaps you can attend and put it on the agenda to allow for an overview / sort of Q&A, depending on what folks are interested in?

Great idea. @ssahasra does that work for you? (tag @jhuber6).

Yeah we should certainly discuss this in the working group meeting. Which “next Friday” is this? I am not available on 31st March or 7th April. I will definitely want to present in the next available meeting.

It was supposed to be 7th, I guess we can move it by one week to the 14th.

Sounds good to me!

The RFC was updated and simplified in response to some questions by @jdoerfert and @jsilvanus . We believe it’s in a good shape now, and would like to submit it on Monday, June 26th, unless there are more comments by then.

Apologies, this left my mental cache. What do you mean by that?

This is about committing ⚙ D147116 [RFC] Introduce convergence control intrinsics.

And just in case, "questions by @jdoerfert " refers to the last GPU working group meeting where we had presented the RFC. Johannes was interested in meaty motivational examples, because the loop examples we gave could be explained without tokens too. So the spec is now updated to bring out the motivation for having explicit tokens.

FWIW, I agree we need this.

I am a little worried nobody that was not involved in the design (looking at @nhaehnle), accepted it yet. We can rubber stamp it now, but I would hope people would fine the time in the next 2 weeks to actually look, or look again. I will try, whatever that means.

(Tag: @efriedma-quic @jhuber6 @shiltian @Artem-B @bader @AnastasiaStulova …)

1 Like

Thanks! I am sure we can wait two more weeks with the hope that things will move forward. FWIW, the original RFC in D85603 had received a lot of discussion. This new RFC only clarifies and simplifies some things, so most of the original discussion is still relevant. The impression created back then was that people understood what is being introduced here and why, but that RFC never reached a conclusion.

Yep, that’s what I was hoping to correct. We got this wrong at least once, maybe this is the time we don’t :slight_smile:

Added another step in the stack of reviews. So far we have:

  1. ⚙ D147116 [RFC] Introduce convergence control intrinsics
  2. ⚙ D152431 [Inliner] Handle convergence control when inlining a call
  3. ⚙ D153744 [LoopUnroll] adjust for new `convergent` semantics

@jdoerfert are you okay with rubber-stamping D147116? I can wait till next Friday, 7th July before I submit.

The first change, D147116 is now submitted. There always room for discussion, and we are eager to receive feedback about the proposed experimental intrinsics and their semantics.