Hello.
Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code:
void foo(long *A, long *B, long *C, long N) {
for (long i = 0; i < N; ++i) {
C[i] = A[i] + B[i];
}
}
The vectorized LLVM program I obtain contains 2 vector.body blocks - one named "vector.body" and the other "vector.body34" for example. The code seems correct - the first "vector.body" block is responsible for the vector add of a number of vector elements multiple of VF * UF. There are 2 epilogues which makes things a bit strange - I am still trying to understand the code.
Is it possible to explain to me where in LoopVectorize.cpp are created 2 vector.body blocks? I know that InnerLoopVectorizer::vectorize() calls InnerLoopVectorizer::createEmptyLoop() which creates the blocks required for vectorization, but I have difficulties to follow the classes instantiations.
I ask because in fact, I would prefer having only one "vector.body" block for the above C program, as it was happening with LoopVectorize.cpp version of Nov 2015.
Thank you very much,
Alex
Hi Alex,
How do you compile this program? I compile it as follows, and don’t see extra vector-bodies:
bin/clang -O3 vec.c -S -o - |grep “##”
_foo: ## @foo
BB#0: ## %entry
BB#1: ## %for.body.preheader
BB#8: ## %min.iters.checked
BB#9: ## %vector.memcheck
BB#10: ## %vector.memcheck
BB#11: ## %vector.body.preheader
BB#12: ## %vector.body.prol
LBB0_13: ## %vector.body.prol.loopexit
BB#14: ## %vector.body.preheader.new
LBB0_15: ## %vector.body
=>This Inner Loop Header: Depth=1
LBB0_16: ## %middle.block
LBB0_2: ## %for.body.preheader27
BB#3: ## %for.body.prol.preheader
LBB0_4: ## %for.body.prol
=>This Inner Loop Header: Depth=1
LBB0_5: ## %for.body.prol.loopexit
BB#6: ## %for.body.preheader27.new
LBB0_7: ## %for.body
=>This Inner Loop Header: Depth=1
LBB0_17: ## %for.cond.cleanup
Best regards,
Michael
Hello.
Michael, thank you for your answer - indeed, your command generates only 1 vector.body.
I give the following commands to compile:
$(LLVM_PATH)/clang -fvectorize -mllvm -force-vector-width=8 src.c -S -emit-llvm
$(LLVM_PATH)/opt -debug -O3 -loop-vectorize -force-vector-width=8 src.ll -S >3better_after_opt.ll
$(LLVM_PATH)/llc -print-after-all -debug -march=connex -O0 -asm-show-inst -asm-verbose src_after_opt.ll
I'd like to mention I am using the version of LoopVectorize.cpp from beginning of Jul 2016.
Best regards,
Alex
Hello.
Michael, thank you for your answer - indeed, your command generates only 1 vector.body.
I give the following commands to compile:
$(LLVM_PATH)/clang -fvectorize -mllvm -force-vector-width=8 src.c -S -emit-llvm
$(LLVM_PATH)/opt -debug -O3 -loop-vectorize -force-vector-width=8 src.ll -S >3better_after_opt.ll
$(LLVM_PATH)/llc -print-after-all -debug -march=connex -O0 -asm-show-inst -asm-verbose src_after_opt.ll
Hi Alex,
I assume you run these three commands to model a clang’s O3 behavior using opt? If so, then it’s better to do it the following way:
-
Generate IR with clang before optimizations kick in:
clang -O3 -mllvm -disable-llvm-optzns -S -emit-llvm src.c -o src_noopt.ll
-
Run opt on it:
opt -O3 src_noopt.ll -S -o src_after_opt.ll
You can also pass you custom flags here, like “-force-vector-width=8”. No need to pass -loop-vectorize, as it’s already present in O3 pipeline. I guess passing it along with O3 might be the reason you see two vector bodies (e.g. remainder loop might have been vectorized by the second invocation of vectorizer).
-
Run llc if you need an asm file:
llc src_after_opt.ll -o src.s -march=connex -asm-show-inst -asm-verbose
Michael