Hi Jeff,
It’s not really meaningful to talk about threads being created in the
context of an OpenCL kernel. The other threads are always present.
Semantically, I'd view this as a collection of threads being spawned
at the entry to the kernel function, and joined at the end, with
release/acquire edges at each barrier. But yes, the threads aren't
literally created or destroyed.
void kernel(int * restrict array, int * restrict array2) {
int value = array[0] + get_thread_id() + 1;
barrier();
array[get_thread_id()] = value;
barrier();
array2[get_thread_id()] = array[0];
}
In this example code, the kernel is well synchronized; there are no data
races on any elements of either array. However, the results will differ if
we CSE the later read of array[0] with the earlier one. Executed as
written, the final value of array2[0] will be array[0]+1. If we perform the
CSE, the result will be just array[0].
I think this follows all the 'restrict' rules.
6.7.3.1 Formal definition of restrict
1 Let D be a declaration of an ordinary identifier that provides a
means of designating an
object P as a restrict-qualified pointer to type T.
2 If D appears inside a block and does not have storage class extern,
let B denote the block. If D appears in the list of parameter
declarations of a function definition, let B denote the associated
block. Otherwise, let B denote the block of main (or the block of
whatever function is called at program startup in a freestanding
environment).
3 In what follows, a pointer expression E is said to be based on
object P if (at some sequence point in the execution of B prior to the
evaluation of E) modifying P to point to a copy of the array object
into which it formerly pointed would change the value of E.117) Note
that ‘‘based’’ is defined only for expressions with pointer types.
4 During each execution of B, let L be any lvalue that has &L based on
P. If L is used to access the value of the object X that it
designates, and X is also modified (by any means), then the following
requirements apply: T shall not be const-qualified. Every other lvalue
used to access the value of X shall also have its address based on P.
Every access that modifies X shall be considered also to modify P, for
the purposes of this subclause. If P is assigned the value of a
pointer expression E that is based on another restricted pointer
object P2, associated with block B2, then either the execution of B2
shall begin before the execution of B, or the execution of B2 shall
end prior to the assignment. If these requirements are not met, then
the behavior is undefined.
5 Here an execution of B means that portion of the execution of the
program that would correspond to the lifetime of an object with scalar
type and automatic storage duration associated with B.
D is a parameter declaration in 'kernel'. P is 'array' (or 'array2',
but I'll just look at 'array' for now). E is expressions like
"&array[get_thread_id()]", which is the address of an object X. The
initial call to the kernel (from a single thread) sets the value of
'array' (P2). The other threads involved in running the kernel have
their own variable 'array' (P), which are assigned from P2. B2 (the
initial call) begins before the execution of B (the other thread's
execution of 'kernel'. All lvalues used to access X have their
addresses depend on 'array'. (This was "many threads ... depend on the
value of the restrict pointer".)
So it's an LLVM bug to assume that array[0] can't alias
array[get_thread_id()] even running in another OpenCL thread. I don't
suppose we have the same bug if the value of a restrict pointer is
stored to a global variable, and then a function is called that uses
the global? Or if you write the value of a restrict pointer to a
concurrent queue to send it to a non-OpenCL thread?