LLVM Loop vectorizer - 2 vector.body blocks appear

Hello.
     Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code:
     void foo(long *A, long *B, long *C, long N) {
       for (long i = 0; i < N; ++i) {
         C[i] = A[i] + B[i];
       }
     }

     The vectorized LLVM program I obtain contains 2 vector.body blocks - one named "vector.body" and the other "vector.body34" for example. The code seems correct - the first "vector.body" block is responsible for the vector add of a number of vector elements multiple of VF * UF. There are 2 epilogues which makes things a bit strange - I am still trying to understand the code.

     Is it possible to explain to me where in LoopVectorize.cpp are created 2 vector.body blocks? I know that InnerLoopVectorizer::vectorize() calls InnerLoopVectorizer::createEmptyLoop() which creates the blocks required for vectorization, but I have difficulties to follow the classes instantiations.
     I ask because in fact, I would prefer having only one "vector.body" block for the above C program, as it was happening with LoopVectorize.cpp version of Nov 2015.

   Thank you very much,
     Alex

Hi Alex,

How do you compile this program? I compile it as follows, and don’t see extra vector-bodies:

bin/clang -O3 vec.c -S -o - |grep “##”

_foo: ## @foo

BB#0: ## %entry

BB#1: ## %for.body.preheader

BB#8: ## %min.iters.checked

BB#9: ## %vector.memcheck

BB#10: ## %vector.memcheck

BB#11: ## %vector.body.preheader

BB#12: ## %vector.body.prol

LBB0_13: ## %vector.body.prol.loopexit

BB#14: ## %vector.body.preheader.new

LBB0_15: ## %vector.body

=>This Inner Loop Header: Depth=1

LBB0_16: ## %middle.block

LBB0_2: ## %for.body.preheader27

BB#3: ## %for.body.prol.preheader

LBB0_4: ## %for.body.prol

=>This Inner Loop Header: Depth=1

LBB0_5: ## %for.body.prol.loopexit

BB#6: ## %for.body.preheader27.new

LBB0_7: ## %for.body

=>This Inner Loop Header: Depth=1

LBB0_17: ## %for.cond.cleanup

Best regards,
Michael

Hello.
     Michael, thank you for your answer - indeed, your command generates only 1 vector.body.

     I give the following commands to compile:
         $(LLVM_PATH)/clang -fvectorize -mllvm -force-vector-width=8 src.c -S -emit-llvm
         $(LLVM_PATH)/opt -debug -O3 -loop-vectorize -force-vector-width=8 src.ll -S >3better_after_opt.ll
         $(LLVM_PATH)/llc -print-after-all -debug -march=connex -O0 -asm-show-inst -asm-verbose src_after_opt.ll

    I'd like to mention I am using the version of LoopVectorize.cpp from beginning of Jul 2016.

   Best regards,
     Alex

Hello.
Michael, thank you for your answer - indeed, your command generates only 1 vector.body.

I give the following commands to compile:
$(LLVM_PATH)/clang -fvectorize -mllvm -force-vector-width=8 src.c -S -emit-llvm
$(LLVM_PATH)/opt -debug -O3 -loop-vectorize -force-vector-width=8 src.ll -S >3better_after_opt.ll
$(LLVM_PATH)/llc -print-after-all -debug -march=connex -O0 -asm-show-inst -asm-verbose src_after_opt.ll

Hi Alex,

I assume you run these three commands to model a clang’s O3 behavior using opt? If so, then it’s better to do it the following way:

  1. Generate IR with clang before optimizations kick in:
    clang -O3 -mllvm -disable-llvm-optzns -S -emit-llvm src.c -o src_noopt.ll

  2. Run opt on it:
    opt -O3 src_noopt.ll -S -o src_after_opt.ll
    You can also pass you custom flags here, like “-force-vector-width=8”. No need to pass -loop-vectorize, as it’s already present in O3 pipeline. I guess passing it along with O3 might be the reason you see two vector bodies (e.g. remainder loop might have been vectorized by the second invocation of vectorizer).

  3. Run llc if you need an asm file:
    llc src_after_opt.ll -o src.s -march=connex -asm-show-inst -asm-verbose

Michael