Hi,
I also posted this question on stack overflow
I can repeat it here:
I have generated LLVM IR code using an IRBuilder, and tried to run a LoopVectorizer on it, but it does not vectorize the code.
Here is my LLVM IR after the ModulePassManager has run on it:
; Function Attrs: argmemonly nofree norecurse nosync nounwind
define void @batch_function_0(ptr nocapture readonly %parameters, ptr nocapture readnone %series, ptr nocapture %state_vars, ptr nocapture readnone %solver_workspace, ptr nocapture readnone %date_time, double %fractional_step) local_unnamed_addr #1 {
entry:
br label %loop
loop: ; preds = %loop, %entry
%index = phi i64 [ 0, %entry ], [ %next_iter, %loop ]
%subtemp = add nuw nsw i64 %index, -100
%var_ptr = getelementptr double, ptr %state_vars, i64 %subtemp
%var = load double, ptr %var_ptr, align 8
%addtemp = add nuw nsw i64 %index, 2
%par_ptr = getelementptr double, ptr %parameters, i64 %addtemp
%par = load double, ptr %par_ptr, align 8
%fmultemp = fmul fast double %par, %var
%var_ptr1 = getelementptr double, ptr %state_vars, i64 %index
store double %fmultemp, ptr %var_ptr1, align 8
%next_iter = add nuw nsw i64 %index, 1
%loopcond.not = icmp eq i64 %next_iter, 100
br i1 %loopcond.not, label %afterloop, label %loop
afterloop: ; preds = %loop
ret void
}
attributes #1 = { argmemonly nofree norecurse nosync nounwind }
Here is how I set up the optimization passes:
llvm::LoopAnalysisManager lam;
llvm::FunctionAnalysisManager fam;
llvm::CGSCCAnalysisManager cgam;
llvm::ModuleAnalysisManager mam;
llvm::PassBuilder pb;
pb.registerModuleAnalyses(mam);
pb.registerCGSCCAnalyses(cgam);
pb.registerFunctionAnalyses(fam);
pb.registerLoopAnalyses(lam);
pb.crossRegisterProxies(lam, fam, cgam, mam);
llvm::ModulePassManager mpm = pb.buildPerModuleDefaultPipeline(llvm::OptimizationLevel::O2);
mpm.addPass(llvm::createModuleToFunctionPassAdaptor(llvm::LoopVectorizePass()));
mpm.run(*data->module, mam);
What else do I need to do in order to get it to vectorize the loop?
For reference, the equivalent C++ function of what I try to represent in the IR would look something like this:
void
testfun(double *state_var, double *par) {
for(int i = 0; i < 100; ++i) {
state_var[i] = state_var[i-100]*par[i+2];
}
}
This loop is vectorized by clang if I compile it with -O2.
(Note that this is only included as a comparison, I am not generating my original IR from C++)
I have also tried to put the noalias keyword on the arguments to the function in the IR, which is what clang does if you put the __restrict keyword in C++, but this still does not solve my problem.