[RFC] OpenMP Support in Clang

Chandler,
Sorry, my bad. See document in attach.

Best regards,
Alexey Bataev

OpenMP_design.rst (4.59 KB)

Chandler,
Sorry, my bad. See document in attach.

Some comments on the design document…

General approach of parsing these directives is to use standard
pragma parsing harness. This invlolves implementation of PragmaHandler-based

Typo “invlolves”.

class for all OpenMP pragmas or several PragmaHandlers for each particular
pragma. Then each pragma, generally processed by Preprocessor, should be turned
into some particular statement-like construct (using
Preprocessor::EnterTokenStream), which then should be parsed in Parser's
context.

Okay, that’s the right approach. PragmaNamespace makes it fairly easy to have separate handlers for the various omp pragmas.

This construct may represent some custom construct or could be
transformed into some standard language construct. Olaf Krzikalla made a proposal <[http://clang-developers.42468.n3.nabble.com/OpenMP-support-in-CLANG-A-proposal-td4027258.html](http://clang-developers.42468.n3.nabble.com/OpenMP-support-in-CLANG-A-proposal-td4027258.html)>_
to handle OpenMP constructs as attributes. So, we can try to translate OpenMP
pragmas into some form of GCC or C++11 attributes and use standard approach to
parse these attributes. However, it requires at least two parsing passes,
with first one transforming pragmas into attributes and second one translating
attributes and verifying their semantic correctness. Thus, it seems that a
better solution would be to introduce new annotation token for OpenMP pragmas
and custom parsing procedures for translating OpenMP pragmas into some special
new Stmts.

I agree that attributes is not the best way to go. The parser can parse the OpenMP pragma into a reasonable AST for that pragma. That AST goes to the normal statement parsing (e.g., an OpenMP parallel directive gets parsed and passed to the “for” statement parser), and gets carried along to semantic analysis. As part of type-checking the for loop, we’ll check the OpenMP constraints.

I think separate statements makes sense, but it depends somewhat on how different they’ll actually be and how many there are. Do you have a good sense of which statements we’ll need?

This approach allows separation of subsequent CodeGen stage for
OpenMP Stmts from CodeGen stage for standard statements, while handling OpenMP
pragmas as attributes requires rework of CodeGen procedures for almost all
statements – in order to handle these new OpenMP specific attributes.

Perhaps. I think it depends a bit on how function outlining is handled. In particular, I could certainly imagine that Sema is involved in outlining, e.g., by keeping track of which variables are captured (which it knows how to do for lambdas/blocks) and having an explicit outlined statement that specifies its captures. If that’s the case, an OpenMP parallel for (for example), could be a ForStmt whose body is an outlined statement and that has the OpenMP parallel for AST added to it. Anyway, I’d love to hear more thoughts on how you want to tackle outlining.

#. Implement support of -fopenmp and -fno-openmp compiler options.

We have the -cc1 -fopenmp now; the driver-level options can wait until the feature is complete.

#. Develop AST representation for each OpenMP directive and clause representing
OpenMP constructions as statements.

Okay. Let’s take a sample one.

#. Implement parsing and semantic analysis for each directive using standard
pragma processing approach (implies implementing of single PragmaHandler for
all directives, parsing procedures for directives, clauses and their
arguments, and semantic checking of all OpenMP construct).

Sure, let’s take this in stages: pick an OpenMP directive and write a parser for it w/ tests. Then add some semantic analysis, then ASTs, and we’ll see how things shape up.

#. Implement CodeGen stage (to be discussed later) for each OpenMP construct.

Some of this will tie into the runtime. I’m most interested in your approach to handling outlining, because that’s going to affect the ASTs.

Some of OpenMP directives still could be represented as attributes, for
example omp threadprivate directive. This directive specifies that variables
from a list following the directive are “replicated, with each thread having its
own copy”. This is a declarative directive, so it would be better to represent
it as an attribute of a varible, not a statement.

That seems reasonable.

I think this is a good start. Looking forward to more patches!

  • Doug

Hello Doug,
Thank you for review. I'll prepare a patch with pragmas and clauses
representation in AST and update design document to fix bugs and to add more
info about function outlining.

Hi Doug and clang-dev,

We think this could answer Doug's question about "How function outlining is handled?" http://lists.cs.uiuc.edu/pipermail/cfe-dev/2013-January/027311.html

Thanks!

Ben Langmuir, Wei Pan and Andy Zhang
Intel of Canada, Waterloo

*BEGIN*

[RFC] Captured Statements

We are proposing a feature that we have called 'Captured Statements', to
support outlining statements into functions in Clang's AST. This
includes capturing references to variables - similar to C++11 lambdas'
automatic capture. However, the feature is more "primitive" than lambdas and
so has less complexity and baggage, and so can be used for implementing other
features not related to lambdas.

We used Captured Statements to support the Cilk Plus C/C++ extension in Clang.
However, we believe that Captured Statements will be useful to others, and are
seeking feedback on the proposed design. In particular, Captured Statements
should be useful in the implementation of OpenMP parallel regions. They may
also be useful in implementing some of the new features (e.g. in-kernel spawning)
being considered for OpenCL 2.0, and for nested functions as in GCC.

There are a set of requirements for function outlining:

(1) Must work for both C and C++ programs
(2) Should be nestable
(3) Should be able to capture most types of variables, including arrays and 'this'
(4) Should be able to customize the capturing and codegen behavior

The primary use case is to support outlining parallel regions into functions
so that they may be passed to a runtime library. Both OpenMP and Cilk Plus
require this kind of outlining to run parallel regions on multiple threads
using a runtime library.

E.g.

#paragma omp parallel
{
  ... // parallel region is outlined, some variable references are captured
}

cilk_spawn foo(a, b, c); // call to foo is outlined into a helper function and
                                               // references to a, b, and c are captured

There are two existing AST constructs closely related to Captured Statements:
Objective-C/C++ blocks and C++11 lambda expressions.

The code generation of "block" calls contains quite a few Objective-C specific
runtime calls. There are also constraints for blocks that do not apply to
Captured Statements, e.g. arrays cannot be captured in blocks.
C++11 lambda expressions work for C++ only, where the context is captured in a
CXXRecordDecl.

As far as we know, neither construct can satisfy all the above requirements,
and a new Captured Statement seems necessary. The proposed AST changes are based
on the AST for blocks, but the codegen is closer to lambdas.

Most existing routines for variable capturing will be shared among blocks,
lambdas and Captured Statements. We still need to extend the current clang
implementation. For example, a OpenMP 'threadprivate' variable should also be
captured, although it may be a static variable or static class member.

AST

Hello,

Hi Doug and clang-dev,

We think this could answer Doug's question about "How function outlining is handled?" http://lists.cs.uiuc.edu/pipermail/cfe-dev/2013-January/027311.html

Thanks!

Ben Langmuir, Wei Pan and Andy Zhang
Intel of Canada, Waterloo

*BEGIN*

[RFC] Captured Statements

We are proposing a feature that we have called 'Captured Statements', to
support outlining statements into functions in Clang's AST. This
includes capturing references to variables - similar to C++11 lambdas'
automatic capture. However, the feature is more "primitive" than lambdas and
so has less complexity and baggage, and so can be used for implementing other
features not related to lambdas.

We used Captured Statements to support the Cilk Plus C/C++ extension in Clang.
However, we believe that Captured Statements will be useful to others, and are
seeking feedback on the proposed design. In particular, Captured Statements
should be useful in the implementation of OpenMP parallel regions. They may
also be useful in implementing some of the new features (e.g. in-kernel spawning)
being considered for OpenCL 2.0, and for nested functions as in GCC.

There are a set of requirements for function outlining:

(1) Must work for both C and C++ programs
(2) Should be nestable
(3) Should be able to capture most types of variables, including arrays and 'this'
(4) Should be able to customize the capturing and codegen behavior

The primary use case is to support outlining parallel regions into functions
so that they may be passed to a runtime library. Both OpenMP and Cilk Plus
require this kind of outlining to run parallel regions on multiple threads
using a runtime library.

E.g.

#paragma omp parallel
{
... // parallel region is outlined, some variable references are captured
}

cilk_spawn foo(a, b, c); // call to foo is outlined into a helper function and
                                              // references to a, b, and c are captured

There are two existing AST constructs closely related to Captured Statements:
Objective-C/C++ blocks and C++11 lambda expressions.

The code generation of "block" calls contains quite a few Objective-C specific
runtime calls. There are also constraints for blocks that do not apply to
Captured Statements, e.g. arrays cannot be captured in blocks.
C++11 lambda expressions work for C++ only, where the context is captured in a
CXXRecordDecl.

As far as we know, neither construct can satisfy all the above requirements,
and a new Captured Statement seems necessary. The proposed AST changes are based
on the AST for blocks, but the codegen is closer to lambdas.

Most existing routines for variable capturing will be shared among blocks,
lambdas and Captured Statements. We still need to extend the current clang
implementation. For example, a OpenMP 'threadprivate' variable should also be
captured, although it may be a static variable or static class member.

AST

We propose adding a new abstract AST class CapturedStmt to represent a Captured Statement:

- CapturedStmt derives from Stmt
- CapturedStmt is an abstract class and each kind of outlining
(eg, for OpenMP, Cilk Plus, etc) will create a separate AST class that
derives from CapturedStmt

This part surprises me a little bit. I would have expected that CapturedStmt would be the same across the various consumers of outlining, and that it's the consumers that would differ. An OpenMP parallel for loop would store a CapturedStmt, as might a Cilk spawn expression.

- CapturedStmt will contain "captures", a list of variables referenced within
the Captured Statement that are declared outside the scope of the statement
- The CapturedStmt node will hold a Stmt that is the statement to be outlined.

We have prototyped Captured Statements and created an example for its use.
In our prototype, the "#pragma captured" directive is used to mark a compound statement
as a Captured Statement, which will be outlined into a separate function and the
compound statement will be replaced a call to the outline function immediately.

Please put this undef "#pragma clang __debug captured", and we'll remove it as soon as we get our first "real" client in-tree.

Take the following example,

int foo(int x) {
int y = 7;
#pragma captured
{ y *= x; }

return y;
}

This is equivalent to

int foo(int x) {
__block int y = 7;
^{ y *= x; }();

return y;
}

using a block or

void foo(int x) {
int y = 7;
[&](){ y *= x; }();

return y;
}

using a lambda expression. With the Captured Statement, its AST looks like

(FunctionDecl 0x5b272e0 <captured.c:3:1, line:12:1> foo 'int (int)'
   (ParmVarDecl 0x5b27220 <line:3:9, col:13> x 'int')
   (CompoundStmt 0x5b52d10 <col:16, line:12:1>
     (DeclStmt 0x5b27418 <line:4:3, col:12>
       (VarDecl 0x5b273a0 <col:3, col:11> y 'int'
         (IntegerLiteral 0x5b273f8 <col:11> 'int' 7)))
     (CapturedStmt 0x5b52c60 <line:7:3, line:9:3>
       (Capture (Var 0x5b273a0 'y' 'int'))
       (Capture (ParmVar 0x5b27220 'x' 'int'))
       (CompoundStmt 0x5b52c40 <line:7:3, line:9:3>
         (CompoundAssignOperator 0x5b52c08 <line:8:5, col:10> 'int' '*=' ComputeLHSTy='int' ComputeResultTy='int'
           (DeclRefExpr 0x5b52a88 <col:5> 'int' lvalue Var 0x5b273a0 'y' 'int')
           (ImplicitCastExpr 0x5b52bf0 <col:10> 'int' <LValueToRValue>
             (DeclRefExpr 0x5b52b40 <col:10> 'int' lvalue ParmVar 0x5b27220 'x' 'int')))))
     (ReturnStmt 0x5b52cf0 <line:11:3, col:10>
       (ImplicitCastExpr 0x5b52cd8 <col:10> 'int' <LValueToRValue>
         (DeclRefExpr 0x5b52cb0 <col:10> 'int' lvalue Var 0x5b273a0 'y' 'int'))))))

which is almost of the same AST for the block example above.

An implicit RecordDecl(not CXXRecordDecl) will be created to hold all the capture fields,
and the capture type is by reference by default. The statement to be captured will
be the body of an implicit FunctionDecl.

Just FWIW, you'll almost certainly need to build a CXXRecordDecl in C++ mode, but that shouldn't make what you're doing any harder.

Semantic analysis

There are a number of common constraints on statements to be captured, and this needs
to be elaborated further. A general rule is to treat a Captured Statement as
a function body. For example, the use of jump statements into and out of the
statement is limited.

Some refactoring may be required to accommodate needs for derived Captured Statements.
For example, one Captured Statement may allow throw expressions but another may not.

I guess that's one reason to have different Captured Statement subclasses, but even that's just contextual information that we can easily encode in the single Captured Statement node.

Code generation

The Captured Statement AST is close to blocks, but the code generation is completely
different. In fact, for straight Captured Statements (those without additional
language extension runtime calls inserted), both emission of the outlined function
and its invocation are much closer to lambdas. The only difference is that the
capture context is explicitly passed as the first argument.

The code emitted for the outlined function looks like:

%struct.capture = type { i32*, i32* }

define internal void @__captured_stmt_helper(%struct.capture* %this) nounwind {
entry:
%this.addr = alloca %struct.capture*, align 8
store %struct.capture* %this, %struct.capture** %this.addr, align 8
%0 = load %struct.capture** %this.addr
%1 = getelementptr inbounds %struct.capture* %0, i32 0, i32 1
%ref = load i32** %1, align 8
%2 = load i32* %ref, align 4
%3 = getelementptr inbounds %struct.capture* %0, i32 0, i32 0
%ref1 = load i32** %3, align 8
%4 = load i32* %ref1, align 4
%mul = mul nsw i32 %4, %2
store i32 %mul, i32* %ref1, align 4
ret void
}

*END*

Looks reasonable. I think this is a great approach, and I look forward to seeing the patches.

  - Doug

Hi Doug,

Thanks for the feedback! All suggestions make good sense to us. We will address them in our future patches.

Before sending out patches for commit, we are attaching an *incomplete* implementation of captured statements. By applying this patch, clang should compile the following function:

void foo(int &x) {
  #pragma captured
  {
    int y = 100;
    x += y;
    
    #pragma captured
    {
      y++;
      x -= y;
    }
  }
}

There are a number of missing features like template support that we will be working on.

We would welcome any suggestions or feedback.

Thanks!

Ben Langmuir, Wei Pan and Andy Zhang

captured_stmt.patch (55.4 KB)

Hi Wei,

This is not a full review.

+ CXCursor_LastStmt = CXCursor_ExampleCapturedStmt,

Please bump CINDEX_VERSION_MINOR.

+// \brief the base class for capturing a statement into a separate function

Please use three slashes, a capital letter at the beginning and a full
stop at the end.

+void CapturedStmt::setCaptures(ASTContext &Context,
+ const Capture *begin,
+ const Capture *end) {
...
+ void *buffer = Context.Allocate(allocationSize, /*alignment*/sizeof(void*));

(1) Use llvm::alignOf<>.
(2) Tail-allocate instead of doing an additional allocation. See
CXXTryStmt::Create as an example.

+++ b/test/CodeGen/captured-statements.c
@@ -0,0 +1,68 @@
+// RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s
-check-prefix=CHECK-GLOBALS
+// RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s -check-prefix=CHECK-1
+// RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s -check-prefix=CHECK-2

Please use a temporary file.

+++ b/test/CodeGenCXX/captured-statements.cpp
@@ -0,0 +1,79 @@
+// RUN: %clang_cc1 -std=c++11 -emit-llvm %s -o - | FileCheck %s
-check-prefix=CHECK-1

This test uses CHECK-2 and CHECK-3, too.

+ if (CurCGCapturedStmtInfo)
+ delete CurCGCapturedStmtInfo;

It is safe to delete a null pointer.

+ Token *Toks = (Token*) PP.getPreprocessorAllocator().Allocate(
+ sizeof(Token) * 1, llvm::alignOf<Token>());
+ new (Toks) Token();

Allocate<Token>() ?

+ if (Tok.isNot(tok::l_brace)) {
+ PP.Diag(Tok, diag::err_expected_lbrace) << "after #pragma captured";

Please don't put prose into %x. Define a separate diagnostic if needed.

+SmallVector<CapturedStmt::Capture, 4>
+Sema::buildCapturedStmtCaptureList(SmallVector<CapturingScopeInfo::Capture, 4>
+ &Candidates) {

Candidates should be an ArrayRef, and return value should be an
out-parameter of type SmallVectorImpl<>.

Dmitri

This would violate minor version semantics (at least for most
versioning schemes; idk which one we officially use). This change is
not backwards compatible e.g. if a header declares

extern int arr[CXCursor_LastStmt];

then code compiled against this version will read off the end of the
array which was compiled with a previous minor version.

-- Sean Silva

+ CXCursor_LastStmt = CXCursor_ExampleCapturedStmt,

Could we not add that ? We should try to avoid exposing internal AST nodes via libclang's interface unless there is a compelling reason to do so.

Please bump CINDEX_VERSION_MINOR.

This would violate minor version semantics (at least for most
versioning schemes; idk which one we officially use). This change is
not backwards compatible e.g. if a header declares

extern int arr[CXCursor_LastStmt];

then code compiled against this version will read off the end of the
array which was compiled with a previous minor version.

The presence of CXCursor_Last* enumerators is unfortunate; IMO it'd be better if they were not exposed for the reasons Sean mentioned.
That said, they have been there forever, without version numbers, and it did not stop us from adding more enumerators.

I'd suggest that we document that these are likely to change between minor versions and avoid using them as much as possible.

Hello everybody,
I'd like to discuss representation of OpenMP directives and clauses in AST.
I'm trying to solve this problem by introducing new Stmt-based classes for
each executable directive and clause. But there is a class AttributedStmt
with some remarkable comment:
/// Represents an attribute applied to a statement. For example:
/// [[omp::for(...)]] for (...) { ... }

So, what is preferred solution? Introduce new Stmt classes for directives
and clauses or it would be better to try to represent them as attributes and
to use existing class AttributedStmt?
If we'll try to use the second approach (attributes) then I think I have to
represent each directive and clause as a particular attribute, so the
statement will look like:
[[omp::parallel]] [[omp::if(expr)]] [[omp::private(var1, var2, ...)]]...
stmt

I think this was discussed before, and the consensus was to create new
AST nodes because:

* translating pragmas to attributes might be hard;
* preserving enough source information in attributes that really
represent pragmas will be definitely hard and non-intuitive.

The [[omp::for...]] example is there because OpenMP pragmas was a
motivation for C++11 attributes. Like: C++11 attributes will enable
OpenMP to create a new syntax without pramgas. But this did not
happen yet.

Dmitri

Dmitri, thank you very much for your answer.
Yes, I remember It was discussed. But I can parse pragmas as pragmas but represent them as an attributes. So I just wanted to be sure that the new classes are still ok.

Best regards,
Alexey Bataev