How to generate IR so that the loop vectorizer can vectorize it

Hi,
I also posted this question on stack overflow

I can repeat it here:
I have generated LLVM IR code using an IRBuilder, and tried to run a LoopVectorizer on it, but it does not vectorize the code.

Here is my LLVM IR after the ModulePassManager has run on it:

; Function Attrs: argmemonly nofree norecurse nosync nounwind
define void @batch_function_0(ptr nocapture readonly %parameters, ptr nocapture readnone %series, ptr nocapture %state_vars, ptr nocapture readnone %solver_workspace, ptr nocapture readnone %date_time, double %fractional_step) local_unnamed_addr #1 {
entry:
  br label %loop

loop:                                             ; preds = %loop, %entry
  %index = phi i64 [ 0, %entry ], [ %next_iter, %loop ]
  %subtemp = add nuw nsw i64 %index, -100
  %var_ptr = getelementptr double, ptr %state_vars, i64 %subtemp
  %var = load double, ptr %var_ptr, align 8
  %addtemp = add nuw nsw i64 %index, 2
  %par_ptr = getelementptr double, ptr %parameters, i64 %addtemp
  %par = load double, ptr %par_ptr, align 8
  %fmultemp = fmul fast double %par, %var
  %var_ptr1 = getelementptr double, ptr %state_vars, i64 %index
  store double %fmultemp, ptr %var_ptr1, align 8
  %next_iter = add nuw nsw i64 %index, 1
  %loopcond.not = icmp eq i64 %next_iter, 100
  br i1 %loopcond.not, label %afterloop, label %loop

afterloop:                                        ; preds = %loop
  ret void
}

attributes #1 = { argmemonly nofree norecurse nosync nounwind }

Here is how I set up the optimization passes:

llvm::LoopAnalysisManager     lam;
llvm::FunctionAnalysisManager fam;
llvm::CGSCCAnalysisManager    cgam;
llvm::ModuleAnalysisManager   mam;

llvm::PassBuilder pb;

pb.registerModuleAnalyses(mam);
pb.registerCGSCCAnalyses(cgam);
pb.registerFunctionAnalyses(fam);
pb.registerLoopAnalyses(lam);
pb.crossRegisterProxies(lam, fam, cgam, mam);

llvm::ModulePassManager mpm = pb.buildPerModuleDefaultPipeline(llvm::OptimizationLevel::O2);

mpm.addPass(llvm::createModuleToFunctionPassAdaptor(llvm::LoopVectorizePass()));

mpm.run(*data->module, mam);

What else do I need to do in order to get it to vectorize the loop?

For reference, the equivalent C++ function of what I try to represent in the IR would look something like this:

void
testfun(double *state_var, double *par) {
    for(int i = 0; i < 100; ++i) {
        state_var[i] = state_var[i-100]*par[i+2];
    }
}

This loop is vectorized by clang if I compile it with -O2.
(Note that this is only included as a comparison, I am not generating my original IR from C++)

I have also tried to put the noalias keyword on the arguments to the function in the IR, which is what clang does if you put the __restrict keyword in C++, but this still does not solve my problem.

As a followup, if I run

opt -loop-vectorize -force-vector-width=2 -S -o testiropt.txt testir.txt

I am able to get it to vectorize it, but I am not sure how I would set the force-vector-width option when I automatically generate the IR in my program. I am making a jit compiler, so I can’t use the command line tools. Do anybody know?

I assume it doesn’t vectorize it without the forcing since it thinks it may not be cost efficient (although clang thinks it is), but I would still like to be able to test it for myself.

You probably need to set a target triple for the target-specific cost-model to kick in.

I did setTargetTriple on the Module, but that does not change anything.

You may need to add -aa-pipeline=basic-aa

How do I do that from within the C++ API? I am not using a command line tool.

Sorry, my bad. I misread your posts and thought it was the opt invocation that was not working. Please disregard.

I found that I could force the vectorization by setting
llvm::VectorizerParams::VectorizationFactor = 2;

In this simple test case it does not produce faster code on average. In a more complicated case, the vectorization produced slower code, so I guess the cost model is correct, but it was nice to be able to test it.