# ignore assumed dependences.
for (i = 0; i < 4; i++) {
tmp1 = A[3i+1];
tmp2 = A[3i+2];
tmp3 = tmp1 + tmp2;
A[3i] = tmp3;
}
Now I apply for whatever reason a partial reg2mem transformation.
float tmp3[1];
# ignore assumed dependences. // Still valid?
for (i = 0; i < 4; i++) {
tmp1 = A[3i+1];
tmp2 = A[3i+2];
tmp3[0] = tmp1 + tmp2;
A[3i] = tmp3[0];
}
The transformation that you described is illegal because it changes the
behavior of the loop. In the first version only A is modified, and in
the second version of the loop both A and tmp3 are modified. Can you
think of another example that demonstrates why the per-instruction
attribute is needed ?
Hi Nadav,
I can not directly follow why this transformation would be illegal by itself. Introducing stack memory and performing calculations there is something -reg2mem does and that should be legal in the context of sequential LLVM-IR. Did I miss something?
I think the transformation I describe is only 'illegal' in the sense that it makes the llvm.loop.parallel metadata incorrect. This is exactly what I wanted to point out. Metadata was until now always optional, meaning transformations that don't understand a piece of metadata would never transform code in a way that the metadata becomes incorrect. Instead, transformations either know the metadata and
update it accordingly or the metadata will be automatically removed as soon as instructions are touched. My impression here comes e.g. from the blog post describing LLVM meta data [1]: "A subtle point that was touched on above is that we don't want the optimizers to have to know about metadata."
You asked for another example. I had the feeling clang should generate this metadata automatically given certain user defined pragmas, right?
Here a simple ".c" code:
void foo(float *A) {
# pragma vectorize
for (long i = 0; i < 4; i++) {
float tmp3 = A[i];
A[i + 4] = tmp3;
}
}
Do you agree this code would be something we want to execute in parallel? Looking at the LLVM-IR 'clang -O0 -S' generates from it, we actually get the following:
> define void @foo(float* %A) nounwind uwtable {
> entry:
> %A.addr = alloca float*, align 8
> %i = alloca i64, align 8
> %tmp3 = alloca float, align 4
> store float* %A, float** %A.addr, align 8
> store i64 0, i64* %i, align 8
> br label %for.cond
>
> for.cond: ; preds = %for.inc, >
> %0 = load i64* %i, align 8
> %cmp = icmp slt i64 %0, 4
> br i1 %cmp, label %for.body, label %for.end
>
> for.body: ; preds = %for.cond
> %1 = load i64* %i, align 8
> %2 = load float** %A.addr, align 8
> %arrayidx = getelementptr inbounds float* %2, i64 %1
> %3 = load float* %arrayidx, align 4
> store float %3, float* %tmp3, align 4
clang produces by default a lot of temporary stack arrays. This loop is not vectorizable before -mem2reg is executed. Attaching the loop.parallel metadata would either be incorrect or we would need to define precisely which memory references need to be moved to registers
before the parallelism that was declared by the metadata is actually there.
I am afraid that so many different llvm transformations will have to be
modified to preserve parallelism. This is not something that I want to
slip in. If we want to add new parallelism semantics to LLVM them we
need to discuss the bigger picture.
> We need to plan a mechanism that
will allow us to implement support for a number of different models
(Vectorizers, SPMD languages such as GL and CL, parallel threads such as
OpenMP, etc).
I am not proposing to change the types of parallelism the proposed meta-data should cover. I just want to make sure the semantics of the proposed meta-data are well defined.
Cheers
Tobi
[1] Extensible Metadata in LLVM IR - The LLVM Project Blog