convergent operations in OpenMP

Resending to the correct list …

Hi all,

I’ve been working on making the “convergent” attribute be the default on
all functions and calls, along with changes to optimizations so that
there is no performance regression on CPU targets.

The overall idea is as follows:

  1. Convergent operations are only affected by divergent branches.
  2. Control flow optimizations should care about convergent operations
    only if they occur in a context where the optimization affects
    diverent branches.
  3. CPU targets are single-threaded, and hence trivially, there are no
    divergent branches. This is sufficient to ensure that optimizations
    are never affected on CPU targets if they follow #2.

See the following review for a discussion that sent us down this path:

https://reviews.llvm.org/D69498

My first attempt was to modify the Sink optimization, which currently
does not sink an operation if it is convergent. The change additionally
checks to see if divergent control flow is present to prevent sinking:

https://reviews.llvm.org/D106859

But this broke OpenMP lit tests like this one:

openmp/tools/archer/tests/barrier/barrier.c

The problem is that OpenMP builtins currently rely on the “convergent”
attribute to convey barrier semantics, even on CPU targets. I am
guessing that the actual implementation on a CPU will use other native
primitives like atomics. But it seems reasonable to say “convergent” and
expect it to mean exactly what it says, without relying on the
underlying implementation.

But this means that the “convergent” property is environment-defined
rather than target-defined. If we go down that path, what would be the
correct way to say this in an LLVM module? Should “OpenMP” be an
environment in the target triple? Or something more general, that
conveys that this module depends on “convergent communication outside of
the memory model”?

Sameer.

Resending to the correct list …

Hi all,

I’ve been working on making the “convergent” attribute be the default on
all functions and calls, along with changes to optimizations so that
there is no performance regression on CPU targets.

The overall idea is as follows:

  1. Convergent operations are only affected by divergent branches.
  2. Control flow optimizations should care about convergent operations
    only if they occur in a context where the optimization affects
    diverent branches.
  3. CPU targets are single-threaded, and hence trivially, there are no
    divergent branches. This is sufficient to ensure that optimizations
    are never affected on CPU targets if they follow #2.

See the following review for a discussion that sent us down this path:

https://reviews.llvm.org/D69498

My first attempt was to modify the Sink optimization, which currently
does not sink an operation if it is convergent. The change additionally
checks to see if divergent control flow is present to prevent sinking:

https://reviews.llvm.org/D106859

But this broke OpenMP lit tests like this one:

openmp/tools/archer/tests/barrier/barrier.c

The problem is that OpenMP builtins currently rely on the “convergent”
attribute to convey barrier semantics, even on CPU targets. I am
guessing that the actual implementation on a CPU will use other native
primitives like atomics. But it seems reasonable to say “convergent” and
expect it to mean exactly what it says, without relying on the
underlying implementation.

Out of curiosity, since I’m not familiar with the whole OpenMP setup: what does the IR currently look like at the point where the test gets broken?

Cheers,
Nicolai

Nicolai Hähnle writes:

My first attempt was to modify the Sink optimization, which currently
does not sink an operation if it is convergent. The change additionally
checks to see if divergent control flow is present to prevent sinking:

https://reviews.llvm.org/D106859

But this broke OpenMP lit tests like this one:

openmp/tools/archer/tests/barrier/barrier.c

The problem is that OpenMP builtins currently rely on the "convergent"
attribute to convey barrier semantics, even on CPU targets. I am
guessing that the actual implementation on a CPU will use other native
primitives like atomics. But it seems reasonable to say "convergent" and
expect it to mean exactly what it says, without relying on the
underlying implementation.

Out of curiosity, since I'm not familiar with the whole OpenMP setup: what
does the IR currently look like at the point where the test gets broken?

Good question! I am equally unfamiliar with the OpenMP flow. The IR that
I get from "clang -emit-llvm ..." contains calls to runtime functions
and I don't know how to link those into the IR. I am currently working
on the simple fact that my review request is showing failures with a
bunch of "libarcher" tests.

But now it seems I am getting failures locally with or without my
change. So, false alarm, maybe?

Sameer.

Nicolai Hähnle writes:

My first attempt was to modify the Sink optimization, which currently
does not sink an operation if it is convergent. The change additionally
checks to see if divergent control flow is present to prevent sinking:

https://reviews.llvm.org/D106859

But this broke OpenMP lit tests like this one:

openmp/tools/archer/tests/barrier/barrier.c

The problem is that OpenMP builtins currently rely on the "convergent"
attribute to convey barrier semantics, even on CPU targets. I am
guessing that the actual implementation on a CPU will use other native
primitives like atomics. But it seems reasonable to say "convergent" and
expect it to mean exactly what it says, without relying on the
underlying implementation.

Out of curiosity, since I'm not familiar with the whole OpenMP setup: what
does the IR currently look like at the point where the test gets broken?

Good question! I am equally unfamiliar with the OpenMP flow. The IR that
I get from "clang -emit-llvm ..." contains calls to runtime functions
and I don't know how to link those into the IR. I am currently working
on the simple fact that my review request is showing failures with a
bunch of "libarcher" tests.

But now it seems I am getting failures locally with or without my
change. So, false alarm, maybe?

Libarcher tests fail for all phab reviews for a while. That has nothing
to do with your patch.

As far as I can tell there is no use of `convergent` on the CPU,
or at least it would not mean much.

~ Johannes

It was said to have been fixed a couple of days ago, I’d try to rebase and retry!
Also check if you can reproduce locally.

Cheers,