OpenCL support

Hi everybody,

I am a student at Politecnico di Milano.
For my Master thesis I am working on extending CLang in order to support OpenCL.

I am working on CLang 2.8 refering to the version 1.1 of the OpenCL standard.
I just finished adding the main OpenCL features to the Parser:

  • function qualifier: __kernel
  • type qualifiers: __constant, __global, __local, __private
  • built-in vector data types are added by the preprocessor during the start up.

The attached patch contains also a set of test cases, placed in the
test/OpenCL subdirectory.

opencl_parser_patch.zip (23.7 KB)

Hi,

26.11.2010 11:49, Alberto Magni kirjoitti:

Now I plan to add the support for code generation.

We have worked on this at TUT and URJC and have LLVM passes
that add the required work item loops + optionally do work
item chaining/merging to produce more ILP or vectorizing
opportunities for the final code generation. It doesn't yet
handle all the barrier cases but could be a good starting point.
I think the loops are the main thing needed before the kernel
code can be executed on a target that is not SPMD.

Our plan has been to release those passes but we haven't
had time for that.

The passes are described in this paper:
http://tce.cs.tut.fi/papers/StandaloneOpenCL.pdf

Currently our compilation flow assumes __attribute__((reqd_work_group_size(X, Y, Z)))
in the kernel and the passes detect the kernels in the bitcode
by a function naming convention and everything is compiled offline
together with the OpenCL host program to produce parallelizable
code for ASIPs.

So far we have used a hackish custom python preprocessor to
process the OpenCL kernel code to C++ and we would like to get
rid of this by, for example, using Clang.

The passes should be converted to use the metadata you mentioned and
possibly a parameter to the LLVM pass for setting the work group
dimensions so they become usable for executing the OpenCL kernel
compiler from the OpenCL main program. The passes need to know
how many times they should/can merge the work items etc.

You can ask Carlos Sánchez de La Lama for more info and
the code if you are interested as he is the main author.
He should be subscribed to this list.

BR,

Hi Alberto,

This is great. I'm working on a project that uses Clang's OpenCL
frontend, and this will be very useful to me. I read your patch and
have a few comments below:

- Your patch does not apply cleanly against the current Clang sources
  in subversion. This is a must if you would like your changes to
  be applied upstream.

- Please make sure your patch conforms with the LLVM coding standards
  (http://llvm.org/docs/CodingStandards.html). In particular indentation
  is inconsistent in some areas.

- Similarly, your patch reformats parts of the Clang source code in a
  way that is inconsistent with the coding standards. Please remove
  these changes.

- For such a large change as this, it would be really nice (and
  easier to review) if your patch were broken up into a series of
  separate logical changes. This is easy to do using git; I saw
  you're using hg so this should also be possible.

- You added an OpenCLMask field to Qualifiers for storing the OpenCL
  address space. This doubles the size of the Qualifiers object.
  Is there any reason why you did not use the existing AddrSpace
  subfield? (The way I imagine it could work is to define fixed
  address space constants for the constant, global, local and private
  address spaces). You also don't need to use a bitmask (at least
  at the AST level) since the address spaces are mutually exclusive.

- You added a __kernel field to FunctionDecl. A better way
  of handling this IMO would be to add a new attribute (Attr subclass).
  See the include/clang/Basic/Attr.td file.

I also have a few comments on code generation below:

adding metadata for __kernel functions

The metadata for kernel functions is target-specific. For our PTX
backend, for example, kernel functions are marked using a specific
calling convention. Have a look at how I added PTX support for CUDA
kernel functions here: [1]

tagging with metadata the alloca instructions for __local variables

The way I planned to handle this was to give __local variables with
function scope a static storage-class, so they would be codegen'd into
global variables rather than alloca instructions. This makes the
implementation easier on the LLVM side as we don't yet have support
for address space attributes on alloca instructions (remember that
pointers to __local variables must also be __local). I implemented
a patch [2] for this, but decided to wait until the OpenCL semantic
support in Clang was more mature.

Thanks,

Hi Pekka,
thank you for the reply and for the paper, I have never heard of SAP before,
very interesting field.

Hi Peter,

the suggestion you gave my are what I am looking for, thank you very much.

The problem was the usage of tabs instead of spaces, fixed.

  • Similarly, your patch reformats parts of the Clang source code in a
    way that is inconsistent with the coding standards. Please remove
    these changes.

  • For such a large change as this, it would be really nice (and
    easier to review) if your patch were broken up into a series of
    separate logical changes. This is easy to do using git; I saw
    you’re using hg so this should also be possible.

Ok, I’ll look for it in Mercurial.

  • You added an OpenCLMask field to Qualifiers for storing the OpenCL
    address space. This doubles the size of the Qualifiers object.
    Is there any reason why you did not use the existing AddrSpace
    subfield? (The way I imagine it could work is to define fixed
    address space constants for the constant, global, local and private
    address spaces).

I added a new mask for the sake of simplicity, but I think I’ll follow your advice.

You also don’t need to use a bitmask (at least
at the AST level) since the address spaces are mutually exclusive.

This is a problem on which I thought a lot. Since this qualifiers are mutually
exclusive I could have used only two bits to save one of the possible four cases.
But let’s take into consideration this test case:

typedef __global int global_typedef;
__local global_typedef* t4; //expected-error {{multiple named address spaces}}

The compilation error is due to the presence of a different OpenCL qualifier both
in the typdef both in the variable definition. Now, what is the canonical type of “t4” ?
“__global __local int”.
In the method Sema::HandleDeclarator there is a call to:

TypeSourceInfo *TInfo = GetTypeForDeclarator(D, S);

GetTypeForDeclarator will build the canonical type of the declarator passed
as argument (D).
The actual stack trace is:

HandleDeclarator
GetTypeForDeclarator
BuildPointerType
getPointerType
getCanonicalType

The call to getCanonicalType will result in the building of a new “Qualifiers” object,
starting from “__local”, the local OpenCL qualifier the “__global” qualifier
will also be added.
If I add an assert in the Qualifier class to the method that adds the new qualifier (__global):

assert (there are no previous OpenCL qualifiers here)

it will for sure signal an error, since the Qualifiers object contains already an OpenCL
qualifiers: (__local).

To overcome this I allowed the possibility to have an inconsistent state in the Qualifiers
object, ie 2 bits in the OpenCL qualifiers mask, with the assurance that this inconsistency
would be checked in the method Sema::ActOnVariableDeclaration, also called by
HandleDeclarator.

One solution would be to add a check for duplicated OpenCL qualifiers in
HandleDeclarator before the call to GetTypeForDeclarator, but I think that this would be
misplaced since all the semantic checks on a variable declaration should be performed in
ActOnVariableDeclarator.

Right now I have no idea how to solve this issue.

Any suggestion is greatly appreciated.

  • You added a __kernel field to FunctionDecl. A better way
    of handling this IMO would be to add a new attribute (Attr subclass).
    See the include/clang/Basic/Attr.td file.

Sorry for the stupid question, but what is the advantage of using an
attribute instead of a function specifier ?

Thank you very much again.

Bye bye.

Alberto

Hi Alberto,

You also don't need to use a bitmask (at least
> at the AST level) since the address spaces are mutually exclusive.
>

This is a problem on which I thought a lot. Since this qualifiers are
mutually
exclusive I could have used only two bits to save one of the possible four
cases.
But let's take into consideration this test case:

typedef __global int global_typedef;
__local global_typedef* t4; //expected-error {{multiple named address
spaces}}

The compilation error is due to the presence of a different OpenCL qualifier
both
in the typdef both in the variable definition. Now, what is the canonical
type of "t4" ?
"__global __local int".
In the method Sema::HandleDeclarator there is a call to:

TypeSourceInfo *TInfo = GetTypeForDeclarator(D, S);

GetTypeForDeclarator will build the canonical type of the declarator passed
as argument (D).
The actual stack trace is:

HandleDeclarator
GetTypeForDeclarator
BuildPointerType
getPointerType
getCanonicalType

The call to getCanonicalType will result in the building of a new
"Qualifiers" object,
starting from "__local", the local OpenCL qualifier the "__global" qualifier
will also be added.
If I add an assert in the Qualifier class to the method that adds the new
qualifier (__global):

assert (there are no previous OpenCL qualifiers here)

it will for sure signal an error, since the Qualifiers object contains
already an OpenCL
qualifiers: (__local).

To overcome this I allowed the possibility to have an inconsistent state in
the Qualifiers
object, ie 2 bits in the OpenCL qualifiers mask, with the assurance that
this inconsistency
would be checked in the method Sema::ActOnVariableDeclaration, also called
by
HandleDeclarator.

One solution would be to add a check for duplicated OpenCL qualifiers in
HandleDeclarator before the call to GetTypeForDeclarator, but I think that
this would be
misplaced since all the semantic checks on a variable declaration should be
performed in
ActOnVariableDeclarator.

Actually, this is a semantic check on declarators, not variable
declarations. Remember that the multiple address space diagnostic
may arise anywhere a declarator may occur (which includes variable
declarations but may also include things like cast and sizeof
expressions).

Right now I have no idea how to solve this issue.

Any suggestion is greatly appreciated.

The best place to perform the check is when building the declarator's
type in GetTypeForDeclarator. At that point you have the unqualified
type and can check its address space. In fact this is how the existing
support for address space qualifiers works -- have a look at the
HandleAddressSpaceTypeAttribute function in SemaType.cpp. Specifically
at the top of that function we check that the unqualified type does
not have an address space.

This brings me to my second point, which is that the address
space qualifiers should be handled internally using AttributeLists.
This makes the change relatively lightweight, as we can build on the
existing support for address space qualifiers. I would recommend
parsing the OpenCL address space qualifiers into a synthesised
AttributeList, which can be handled like an address space attribute
in Sema. r112939 shows how this was done for the __pascal calling
convention (which can be written __pascal or __attribute__((pascal))).

> - You added a __kernel field to FunctionDecl. A better way
> of handling this IMO would be to add a new attribute (Attr subclass).
> See the include/clang/Basic/Attr.td file.
>

Sorry for the stupid question, but what is the advantage of using an
attribute instead of a function specifier ?

We should only add a new field if we expect it to be used often.
Fields are quick to access but they increase the size of every
FunctionDecl object (we're using only 12 bits at the moment but may
wish to use more in future). The __kernel qualifier is a relatively
uncommon qualifier and does not need to be accessed often so it should
be added as an attribute.

Thanks,

I agree with this point as well.

Also, please resend your patch without the unrelated formatting changes and break it up into incremental patches. That will make it much easier to review and get in.

-Tanya

Thank you very much for your help Peter and Tanya.

I will resend all the patches as soon as I finished applying Peter’s suggestions.

Alberto

2010/11/30 Tanya Lattner <lattner@apple.com>