Converting a i32 pointer to a vector of i32 ( C array to LLVM vector)

Hi,

I’m creating a small function in LLVM which gets as a parameter an i32* (this function is called from C code) .

However I know that this pointer is actually a C array of size 10 ( int[10] ).

How can I tell LLVM to consider this i32* as an <10 x i32> (and thus get the performance improvements thanks to SIMD …etc…) ?

Thanks,
Matthieu

Hi Matthieu,

You shouldn't need to do anything, the vectorizer should spot that for you,
if the machine you're compiling to has support for vector instructions. Any
kind of vector operations that you may want to hard-code will make it not
work on anything other than the intrinsics/inline asm you're using, which
is not a good idea.

If your code didn't get vectorized, it's possible that it is not clear
enough that that pointer is being iterated in a way that it's easy for the
vectorizer to spot, so maybe you need to make it clearer, and that depends
on the code in question. If you could share the code (or a similar example)
with the list, people could help you spot the pattern and make it vectorize.

cheers,
--renato

Hi,

Thank you for the information,

So I’m now keeping the array as a pointer (i32*) but the vectorizer doesn’t vectorize it .

I’ve pasted the function code before and after optimization (and the list of optimization that I have activated) in this Gist : https://gist.github.com/maattd/7008683

Some “weird” fact of my LLVM code :

  • all variables (even the one used for the loop condition) are pointers to memory allocated from the C world and passed to the LLVM functions as an argument
  • even with “opt->add(new llvm::DataLayout(*ee->getDataLayout())) ;” in the code, the module->dump() doesn’t output neither data layout, nor triple target

Both those points might confuse the vectorizer ?

Hi,

Thank you for the information,

So I'm now keeping the array as a pointer (i32*) but the vectorizer doesn't
vectorize it .

I've pasted the function code before and after optimization (and the list
of optimization that I have activated) in this Gist :
https://gist.github.com/maattd/7008683

Some "weird" fact of my LLVM code :

* all variables (even the one used for the loop condition) are pointers to
memory allocated from the C world and passed to the LLVM functions as an
argument
* even with "opt->add(new llvm::DataLayout(*ee->getDataLayout())) ;" in the
code, the module->dump() doesn't output neither data layout, nor triple
target

Both those points might confuse the vectorizer ?

>
>> How can I tell LLVM to consider this i32* as an <10 x i32> (and thus get
>> the performance improvements thanks to SIMD ..etc..) ?
>>
>
> Hi Matthieu,
>
> You shouldn't need to do anything, the vectorizer should spot that for
> you, if the machine you're compiling to has support for vector
> instructions. Any kind of vector operations that you may want to hard-code
> will make it not work on anything other than the intrinsics/inline asm
> you're using, which is not a good idea.
>

Which part of the vectorizer is responsible for doing pointer->vector transformations?

-Tom

Both the SLP vectorizer and the Loop vectorizer support vectorizing pointers. The attached code looks like a candidate for the SLP-vectorizer. Can you run the SLP-vectorizer with the flag -mllvm -debug-only=SLP and attach the log ? I think that we are missing the pattern for the roots of the tree.

Thanks,
Nadav

Hi,

So I’ve tried the Loop vectorizer and the SLP vectorizer (LLVM 3.3) on this code : (which is assigning 5 to each element of the array “%b”)

; ModuleID = ‘res.ll’
target datalayout = “e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128”
target triple = “x86_64-unknown-linux-gnu”

; Function Attrs: nounwind
define void @loop_ptr_19121568([10 x i32*]* nocapture %params_vec) #0 {
entry:
%0 = bitcast [10 x i32*]* %params_vec to double**
%temp_1 = load double** %0, align 8
%1 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 1
%2 = load i32** %1, align 8
%temp_2 = bitcast i32* %2 to double*
%3 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 3
%4 = load i32** %3, align 8
%b = bitcast i32* %4 to double*
%5 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 4
%6 = load i32** %5, align 8
%d = bitcast i32* %6 to double*
%7 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 6
%8 = load i32** %7, align 8
%temp_0 = bitcast i32* %8 to double*
%9 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 7
%10 = load i32** %9, align 8
%temp_4 = bitcast i32* %10 to i1*
%11 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 8
%12 = load i32** %11, align 8
%temp_3 = bitcast i32* %12 to double*
%13 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 9
%14 = load i32** %13, align 8
%i = bitcast i32* %14 to double*
store double 1.000000e+00, double* %temp_0, align 8
%15 = load double* %temp_1, align 8
store double %15, double* %temp_2, align 8
%16 = load double* %d, align 8
%17 = fmul double %16, %16
store double %17, double* %temp_3, align 8
%.pre = load double* %temp_0, align 8
%cmp_le1 = fcmp ole double %.pre, %17
store i1 %cmp_le1, i1* %temp_4, align 1
br i1 %cmp_le1, label %“i = temp_0”, label %end_fun

“i = temp_0”: ; preds = %entry, %“i = temp_0”
%18 = load double* %temp_0, align 8
store double %18, double* %i, align 8
%19 = fptoui double %18 to i32
%20 = add i32 %19, -1
%21 = sext i32 %20 to i64
%22 = getelementptr double* %b, i64 %21
store double 5.000000e+00, double* %22, align 8
%23 = load double* %temp_0, align 8
%24 = load double* %temp_2, align 8
%25 = fadd double %23, %24
store double %25, double* %temp_0, align 8
%.pre1 = load double* %temp_3, align 8
%cmp_le = fcmp ole double %25, %.pre1
store i1 %cmp_le, i1* %temp_4, align 1
br i1 %cmp_le, label %“i = temp_0”, label %end_fun

end_fun: ; preds = %“i = temp_0”, %entry
ret void
}

attributes #0 = { nounwind }

If what you are saying is that you know the array of i32 will always be 10 entries, make the function use a constant limit=10 to the loop.

I.e Make the loop limit a constant and not a variable.

Even if I know the size of the array, I’m not always iterating through it entirely so the loop count has to be a variable, but the vectorizer works fine even with a loop limit not constant when compiling C code from Clang for example so I should be able to do the same for this code … (hopefully :slight_smile: )

Matthieu