[RFC] Progress towards OpenMP support

Hi all,

I made some progress on implementing Hal's proposal [1] for
implementing OpenMP support in LLVM. The patch I've attached just
barely compiles, but I'd like to get some input on the design early on
to prevent grief later. I'd especially like some input on the
following points:

* Not dropping any metadata

I think it is better to have an analysis pass that provides a
consistent view of the current !parallel nodes. By addRequired<> ing
it (called ParallizationMetadata currently) in the lowering pass and
only indirectly accessing the metadata through the analysis pass, we
can assure ourselves that we don't lower unsafe regions.

* No information is optional

It simplifies the implementation greatly if we assume that things like
task affinity etc. aren't optional. I don't think this restriction
shifts any significant complexity to the frontends.

* Loops don't have lists of special handling regions

Since loops are always nested inside parallel regions, can't we
maintain this list in the parent region itself?

* Blurry line between asserting and silently ignoring

I'm not very clear on when it is "right" to silently drop metadata and
when to assert. Right now I assert only when a !parallel MDNode has
an incorrect number of children (this is one thing simplified by
making all fields compulsory). The verifier probably needs to know
about this constraint, something I haven't addressed yet. Should our
assertions be stricter or must we not assert at all?

(The code is also up on Github [2]).

Thanks!

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052472.html
[2] https://github.com/sanjoy/llvm/tree/parallel-md

0001-Create-a-class-structure-represent-the-various-kinds.patch (24.6 KB)

Hi all,

I made some progress on implementing Hal's proposal [1] for
implementing OpenMP support in LLVM. The patch I've attached just
barely compiles, but I'd like to get some input on the design early on
to prevent trouble later. I'd especially like some input on the
following points:

* Metadata is never mutated or dropped

I think it is better to have an analysis pass that simply provides a
consistent "view" of the current !parallel nodes instead of one that
mutates the IR in order to make it consistent. By addRequired<> ing
it (called ParallizationMetadata currently) in the lowering pass and
only indirectly accessing the metadata through the analysis pass, we
are assured that we don't parallelize regions with inconsistent
metadata.

* No information is optional

It simplifies the implementation greatly if we change the spec to
assume this. I don't think this restriction shifts any significant
complexity to the frontends.

* Loops don't have lists of special handling regions

Since loops are always nested inside parallel regions, can't we
maintain this list in the parent region itself?

* Tripping assertions vs. silently ignoring metadata

I'm not very clear on when it is "right" to silently ignore metadata and
when to assert. Right now I assert only when a !parallel MDNode has
an incorrect number of children (this is one thing simplified by
making all fields compulsory). The verifier probably needs to know
about this constraint, something I haven't addressed yet. Should our
assertions be stricter or must we not assert at all?

(The code is also up on Github [2]).

Thanks!

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052472.html
[2] https://github.com/sanjoy/llvm/tree/parallel-md

0001-Create-a-class-structure-represent-the-various-kinds.patch (24.6 KB)

Sanjoy,

Thanks for working on this! Comments below...

Hi all,

I made some progress on implementing Hal's proposal [1] for
implementing OpenMP support in LLVM. The patch I've attached just
barely compiles, but I'd like to get some input on the design early on
to prevent trouble later. I'd especially like some input on the
following points:

* Metadata is never mutated or dropped

I think it is better to have an analysis pass that simply provides a
consistent "view" of the current !parallel nodes instead of one that
mutates the IR in order to make it consistent. By addRequired<> ing
it (called ParallizationMetadata currently) in the lowering pass and
only indirectly accessing the metadata through the analysis pass, we
are assured that we don't parallelize regions with inconsistent
metadata.

My rationale for proposing the self-consistent metadata solution was
that it seemed to be the safest option. If we simply insist that all
relevant passes use the ParallizationMetadata pass, without any
verification, then we could end up with parallelization-unaware passes
silently miscompiling code when parallelization is enabled. Do you have
a way of avoiding that?

* No information is optional

It simplifies the implementation greatly if we change the spec to
assume this. I don't think this restriction shifts any significant
complexity to the frontends.

Okay.

* Loops don't have lists of special handling regions

Since loops are always nested inside parallel regions, can't we
maintain this list in the parent region itself?

I'm not sure. We do need to make sure, for example, that an 'ordered'
region stays properly nested within its parent loop.

* Tripping assertions vs. silently ignoring metadata

I'm not very clear on when it is "right" to silently ignore metadata
and when to assert. Right now I assert only when a !parallel MDNode
has an incorrect number of children (this is one thing simplified by
making all fields compulsory). The verifier probably needs to know
about this constraint, something I haven't addressed yet. Should our
assertions be stricter or must we not assert at all?

We should assert when an individual metadata entry is malformed; we
should drop non-self-consistent sets of metadata entries (as these
might have resulted from the actions of non-parallelization-aware
transformation passes.

-Hal

Hi Hal,

My rationale for proposing the self-consistent metadata solution was
that it seemed to be the safest option. If we simply insist that all
relevant passes use the ParallizationMetadata pass, without any
verification, then we could end up with parallelization-unaware passes
silently miscompiling code when parallelization is enabled. Do you have
a way of avoiding that?

I'm still not very clear about this. Either a pass understands (and
looks at) the parallelization metadata or it doesn't. In case it
doesn't it doesn't matter what the metadata looks like anyway. In
case it does, we have the ParallelizationMetadata pass to present a
consistent view of the metadata; how does it matter how the metadata
actually looks like? I think it will be helpful if you could
illustrate your point with an example.

To be clear, I'm not saying that dropping metadata is unreasonable;
but am trying to figure out what the core issues are. :slight_smile:

I'm not sure. We do need to make sure, for example, that an 'ordered'
region stays properly nested within its parent loop.

The ordered region mentions the parent loop anyway. Since the
grandparent region refs the ordered region, we can ensure that the
ordered region isn't dropped (leading to races).

We should assert when an individual metadata entry is malformed; we
should drop non-self-consistent sets of metadata entries (as these
might have resulted from the actions of non-parallelization-aware
transformation passes.

Makes sense.

Thanks!

Hi Hal,

> My rationale for proposing the self-consistent metadata solution was
> that it seemed to be the safest option. If we simply insist that all
> relevant passes use the ParallizationMetadata pass, without any
> verification, then we could end up with parallelization-unaware
> passes silently miscompiling code when parallelization is enabled.
> Do you have a way of avoiding that?

I'm still not very clear about this. Either a pass understands (and
looks at) the parallelization metadata or it doesn't. In case it
doesn't it doesn't matter what the metadata looks like anyway. In
case it does, we have the ParallelizationMetadata pass to present a
consistent view of the metadata; how does it matter how the metadata
actually looks like? I think it will be helpful if you could
illustrate your point with an example.

My largest concern is, for example, having a parallel region with a
serial subregion, and having some transformation pass do something
which drops the metadata marking the serial subregion while leaving the
parent parallel region. This would lead to miscompiled code. As
transformation passes are allowed to drop metadata arbitrarily,
and people have out-of-tree passes that do IPO (which we can't fix
ourselves), I think that we need to assume that any combination of the
metadata can be dropped at any time.

An alternative approach, which you seem to be proposing, is that the
parallelization reads the metadata up front (and then drops it all). If
a pass does not preserve that analysis, then it will be invalidated, and
the parallelization will be lost. This, indeed, might be a better
approach, but would at least require touching a lot more code (if
nothing else, we'd need to audit and then add the analysis preservation
calls to a lot of otherwise-unrelated code).

-Hal

I can generate parallel loops from OpenCL multi-WI work groups in
pocl kernel compilation now. Please let me know when this work is in a state
I could test it for OpenCL. I need the ability to mark the parallel loops
and some pass that actually does something with the parallelism info.

Thanks!

Hi Hal,

An alternative approach, which you seem to be proposing, is that the
parallelization reads the metadata up front (and then drops it all). If
a pass does not preserve that analysis, then it will be invalidated, and
the parallelization will be lost. This, indeed, might be a better
approach, but would at least require touching a lot more code (if
nothing else, we'd need to audit and then add the analysis preservation
calls to a lot of otherwise-unrelated code).

I don't drop any metadata. Let me put it this way: say we have a pass
MakeConsistent which drops inconsistent MD nodes. We will also need
some code in LLVM that parses the IR and converts it to a from that is
more easily processed. My proposal is to squash these two pieces of
code (MakeConsistent and the "parser") into a single one. The
resulting code then doesn't need to mutate the metadata at all since
it consumes the modified, normalized metadata as soon as it is
"produced". It could also make the MD consistent if it wanted to, but
that would be redundant, since it is the only piece of code that
directly reads the MD.

https://github.com/sanjoy/llvm/blob/parallel-md/lib/Analysis/ParallelizationMetadata.cpp
is basically an implementation of this "squashed" pass. It provides an
interface to the MD and this interface normalizes the MD before
exposing it.

Thanks!

Hi Hal,

> An alternative approach, which you seem to be proposing, is that the
> parallelization reads the metadata up front (and then drops it
> all). If a pass does not preserve that analysis, then it will be
> invalidated, and the parallelization will be lost. This, indeed,
> might be a better approach, but would at least require touching a
> lot more code (if nothing else, we'd need to audit and then add the
> analysis preservation calls to a lot of otherwise-unrelated code).

I don't drop any metadata.

Yes, I'm saying that, using your approach, you *should* drop the
metadata. Essentially, once the parallelization information has been
moved from the metadata into the state held by the analysis pass, then
the original metadata should be dropped. This will prevent
stale and possibly-invalid metadata from being dumped from any later
stage in the pipeline.

Doing this should be easy: Create a module transformation pass to drop
the parallelization metadata. It should require and preserve the
parallelization analysis.

Let me put it this way: say we have a pass
MakeConsistent which drops inconsistent MD nodes. We will also need
some code in LLVM that parses the IR and converts it to a from that is
more easily processed. My proposal is to squash these two pieces of
code (MakeConsistent and the "parser") into a single one. The
resulting code then doesn't need to mutate the metadata at all since
it consumes the modified, normalized metadata as soon as it is
"produced". It could also make the MD consistent if it wanted to, but
that would be redundant, since it is the only piece of code that
directly reads the MD.

I understand; I'm just trying to make sure that at no stage do we end
up with inconsistent metadata (even if this is in debug dumps in
between passes).

Thanks again,
Hal