Invoke loop vectorizer

Hi there ,

I use clang-cl /Qvec test.c to compile the code. But the pass LoopVectorizer is never invoked.

I was wondering if this is sufficient to enable auto vectorizer?

Thanks,
Xiaochu

Hi Xiaochu,

Clang uses -O0 by default, that doesn’t run any optimizations. Try supplying -O1 or higher.

Yours,
Andrey

Hi Andrey,

Thanks. I found even when loop vectorizer and SLP vectorizer are enabled, my simple test still not get optimized. I also tried clang pragma in my test to force vectorization. What do you think is the problem?

Test:

#define SIZE 8

void bar(int A, int B,int K) {

#pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)

for (int i = 0; i < SIZE; ++i)

A[i] += B[i] + K;

}

Thanks,
Xiaochu

It’s not possible to know that A and B don’t alias in this example. It’s almost certainly not profitable to add a runtime check given the size of the loop.

try

#define SIZE 8

void bar(int restrict A, int restrict B,int K) {

#pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)

for (int i = 0; i < SIZE; ++i)

A[i] += B[i] + K;

}

Hi Daniel,

I increased the size of your test to be 128 but -stats still shows no loop optimized…

Xiaochu

cat > test.c

#define SIZE 128

void bar(int restrict A, int restrict B,int K) {

#pragma clang loop vectorize(enable) vectorize_width(2) unroll_count(8)

for (int i = 0; i < SIZE; ++i)

A[i] += B[i] + K;

}

[dannyb@dannyb-macbookpro3 11:37:20] ~ :slight_smile: $ clang -O3 test.c -c -save-temps

[dannyb@dannyb-macbookpro3 11:38:28] ~ :slight_smile: $ pcregrep -i “^\s*p” test.s|less
pushq %rbp

pshufd $68, %xmm0, %xmm0 ## xmm0 = xmm0[0,1,0,1]
pslldq $8, %xmm1 ## xmm1 = zero,zero,zero,zero,zero,zero,zero,zero,xmm1[0,1,2,3,4,5,6,7]
pshufd $68, %xmm3, %xmm3 ## xmm3 = xmm3[0,1,0,1]
paddq %xmm1, %xmm3
pshufd $78, %xmm3, %xmm4 ## xmm4 = xmm3[2,3,0,1]
punpckldq %xmm5, %xmm4 ## xmm4 = xmm4[0],xmm5[0],xmm4[1],xmm5[1]
pshufd $212, %xmm4, %xmm4 ## xmm4 = xmm4[0,1,1,3]

Note:
It also vectorizes at SIZE=8.

Not sure what the exact translation of options from clang-cl to clang is.
Maybe try adding /O3?

I’m not compiling it to x86. Should loop optimizer something independent of the target? If so, should the vectorized code on IR level?

The loop vectorizer is not independent of the target, since it queries the target for cost estimates to make the vectorization profitability decision.

Your code has a pragma explicitly requesting vectorization, so profitability should not come into play, but there may be other target-related issues. One example I can think of is that we will never vectorize if the target has no vector registers.

Errr, so you are using clang-cl but not on x86 or x86-64?
That’s probably “not well tested”

Right, and if you are not running it on the target, it’s also not going to detect the target features right, i believe?

Thanks, guys!

I found that my target is missing getNumberOfRegistets function. Loop vectorizer is invoked but no loop was examined…

My back end is still under construction… Sorry about that.

Thanks,
Xiaochu