Clang and OpenCL address spaces

Hi Peter,

From my understanding of the OpenCL specification, setting the address
space on an automatic variable is supposed to not only change the
address space but also make the variable static (as the term is defined
in standard C). Adding 'static' to the qualifier macro definitions
won't work too well because the qualifiers can also appear in other
places.

First, I'm not sure whether 'static' is completely appropriate here:
lifetime of __local variables is limited to lifetime of a single workgroup
(unlike 'static' variables in C with the program's lifetime). Second, I
suspect that using the address space attribute is insufficient but need to
investigate a bit more. Third, the OpenCL spec is imprecise in a number of
points concerning address space qualifiers, which I would like to clarify
with Khronos. For starters, can you declare __global variables in
functions?

Best, Anton.

Hi Anton,

Hi Peter,

> From my understanding of the OpenCL specification, setting the address
> space on an automatic variable is supposed to not only change the
> address space but also make the variable static (as the term is defined
> in standard C). Adding 'static' to the qualifier macro definitions
> won't work too well because the qualifiers can also appear in other
> places.

First, I'm not sure whether 'static' is completely appropriate here:
lifetime of __local variables is limited to lifetime of a single workgroup
(unlike 'static' variables in C with the program's lifetime).

To a certain extent I agree. But then again the storage class seems
to be a Clang-level implementation detail (storage classes other than
'typedef' are not supported in OpenCL). If one accesses a variable
outside its lifetime then that is undefined behaviour, so any lifetime
restrictions seem like a lower bound rather than an upper bound.
The spec also forbids __local variables from being initialised,
as well as recursion, so the semantics seem to be the same.

I suppose the real differences arise at the codegen level where the
storage class makes the difference between a global variable and
an 'alloca' instruction being emitted, and I don't think 'alloca'
supports address spaces (maybe it should?).

Third, the OpenCL spec is imprecise in a number of
points concerning address space qualifiers, which I would like to clarify
with Khronos. For starters, can you declare __global variables in
functions?

Indeed. Much of my argument here seems to be based on inferences rather
than what is actually spelt out in the spec.

Also, I'm not entirely sure, following a strict reading of the spec,
whether __constant variables are intended to be supported in functions,
as the spec seems to contradict itself. I raised a bug report with
Khronos here:
http://www.khronos.org/bugzilla/show_bug.cgi?id=366

Thanks,

The call stack is always in the generic address space, so I don't know
how 'alloca' could possibly support address spaces.

I guess the $64,000 question is how you're planning on emitting these
in IR. If you're going to emit them as globals and then rewrite all the
references in a later pass, then 'static' seems fine.

John.

Hi John,

> To a certain extent I agree. But then again the storage class seems
> to be a Clang-level implementation detail (storage classes other than
> 'typedef' are not supported in OpenCL). If one accesses a variable
> outside its lifetime then that is undefined behaviour, so any lifetime
> restrictions seem like a lower bound rather than an upper bound.
> The spec also forbids __local variables from being initialised,
> as well as recursion, so the semantics seem to be the same.
>
> I suppose the real differences arise at the codegen level where the
> storage class makes the difference between a global variable and
> an 'alloca' instruction being emitted, and I don't think 'alloca'
> supports address spaces (maybe it should?).

The call stack is always in the generic address space, so I don't know
how 'alloca' could possibly support address spaces.

OpenCL disallows recursion and function pointers, so the call
graph would be known at compile time. I can think of a couple of
plausible backend implementations of nonzero address space 'alloca'
under these constraints.

I guess the $64,000 question is how you're planning on emitting these
in IR. If you're going to emit them as globals and then rewrite all the
references in a later pass, then 'static' seems fine.

Globals would work for my purposes. Hopefully Anton will be able to
chime in later regarding 'real' GPU implementations.

Thanks,