# Loop vectorizer behaviour for 2D arrays and parallel annotation

Hello,

I am trying to vectorize the following loop but the vectorizer says:
"Found a possible write-write reorder" and does not vectorize.

Why?

for (j=0; j < 8; j++)
{
jj = j << 3;
m2[j][0] = diff[jj ] + diff[jj+4];
m2[j][1] = diff[jj+1] + diff[jj+5];
m2[j][2] = diff[jj+2] + diff[jj+6];
m2[j][3] = diff[jj+3] + diff[jj+7];
m2[j][4] = diff[jj ] - diff[jj+4];
m2[j][5] = diff[jj+1] - diff[jj+5];
m2[j][6] = diff[jj+2] - diff[jj+6];
m2[j][7] = diff[jj+3] - diff[jj+7];
}

Another question is regarding the isannotatedparallel() check. Is
there a way to make clang (or any other frontend) to generate parallel
annotated IR?

Best,

Hello,

I am trying to vectorize the following loop but the vectorizer says:
"Found a possible write-write reorder" and does not vectorize.
Why?

To my knowledge, the dependence analysis in the loop vectorizer is not yet able to prove the absence of dependences here.

for (j=0; j < 8; j++)
{
jj = j << 3;
m2[j][0] = diff[jj ] + diff[jj+4];
m2[j][1] = diff[jj+1] + diff[jj+5];
m2[j][2] = diff[jj+2] + diff[jj+6];
m2[j][3] = diff[jj+3] + diff[jj+7];
m2[j][4] = diff[jj ] - diff[jj+4];
m2[j][5] = diff[jj+1] - diff[jj+5];
m2[j][6] = diff[jj+2] - diff[jj+6];
m2[j][7] = diff[jj+3] - diff[jj+7];
}

Another question is regarding the isannotatedparallel() check. Is
there a way to make clang (or any other frontend) to generate parallel
annotated IR?

Did you try to put '#pragma ivdep' before the loop.

Tobias

P.S.: Please attach a full C file as test case. The way the different data structures are declared my influence the analysis.

Another question is regarding the isannotatedparallel() check. Is
there a way to make clang (or any other frontend) to generate parallel
annotated IR?

Paul Redmond was adding support for "#pragma ivdep" that would use the
parallel metadata, but I haven't been able to follow its progress lately.

That is, if your loop body was an OpenCL kernel with each work-item
executing a single iteration, it *might* get "horizontally vectorized"
using the loop vectorizer if you use pocl's 'loopvec' work group method and
if the memory access pattern is suitable. This is quite fresh code which
I'm still optimizing, but I've already managed to autovectorize some work groups using it.

BR,

I'm still working on it--just slowly I'm hoping to have some more patches in the next week or two.

paul

Hello,

I am trying to vectorize the following loop but the vectorizer says:
"Found a possible write-write reorder" and does not vectorize.
Why?

To my knowledge, the dependence analysis in the loop vectorizer is not yet
able to prove the absence of dependences here.

While that is true, the debug message printed by the vectorizer is

Another question is regarding the isannotatedparallel() check. Is
there a way to make clang (or any other frontend) to generate parallel
annotated IR?

Did you try to put '#pragma ivdep' before the loop.

Thanks for the suggestion, it worked using the latest llvm from svn.
Thanks Pekka and Paul for your inputs.

Tobias

P.S.: Please attach a full C file as test case. The way the different data
structures are declared my influence the analysis.

PFA the example.

-Best,