Vector code

Hi all,

I’m trying to use LLVM to generate SIMD code at runtime (in particular Intel SSE). But I’m having a bit of trouble understanding how to create even the simplest function; adding two vectors of four single-precision floating-point elements. I can get it to add the elements one at a time but not using one vector instruction.

All help much appreciated!

Nicolas Capens

I'd suggest writing code in C and seeing what llvm-gcc does with it. You can also look at (for example) llvm/test/CodeGen/X86/*.ll for many examples.

-Chris

Hi Chris,

Thanks for the advise, but I'm actually not trying to compile code from
text. For now I'm just trying to construct the function directly. Think of
it as the vector equivalent of the HowToUseJIT.cpp example.

Cheers,

-Nicolas

What is your target set to? If LLVM thinks it's targeting a processor
that doesn't have SIMD instructions, it'll split vectors into
scalars like this.

Dan

Nicolas,

Thanks for the advise, but I'm actually not trying to compile code from
text. For now I'm just trying to construct the function directly. Think of
it as the vector equivalent of the HowToUseJIT.cpp example.

llvm2cpp is your friend then. It's now a separate 'target' in llc. It
will generate C++ code, which will construct provided IR.

Thanks for the advise, but I'm actually not trying to compile code from
text. For now I'm just trying to construct the function directly. Think of
it as the vector equivalent of the HowToUseJIT.cpp example.

There is a one to one mapping between text and IR. If you understand what to generate it is much easier to generate it. Otherwise, if you have a specific question, we can help answer that.

-Chris

Hi Chris,

I don't know how to properly create vectors and add them. I can create
arrays, take individual elements and add them, but BinaryOperator::createAdd
doesn't work on vectors for me. The documentation is very extensive for
scalar types and there are plenty of examples, but I haven't found a
straightforward way to translate scalar code to vector code yet. Please bear
with me, I've only just started exploring LLVM's capabilities and I'm still
searching though the documentation for more details about vector types.

Thanks,

-Nicolas

Hi Dan,

My CPU supports up to SSSE3, and I assume LLVM uses that as a target by
default? I don't think that's the problem really, I'm just struggling to
find the right functions/classes to create and manipulate vectors...

Thank you,

-Nicolas

Hi Anton,

I assume that's the same as the online demo's "Show LLVM C++ API code"
option (http://llvm.org/demo/)? I've tried that with a structure containing
four floating-point components but it also appears to add them individually
using extract/insert. Maybe I have to try an array of floats...

Thanks,

Anton

I assume that's the same as the online demo's "Show LLVM C++ API code"
option (Try out LLVM and Clang in your browser!)? I've tried that with a structure containing
four floating-point components but it also appears to add them individually
using extract/insert. Maybe I have to try an array of floats...

You need to use gcc's vector extensions.

Ciao,

Duncan.

From the gcc docs:

5.43 Using vector instructions through built-in functions

Hi Nicolas (at least, I suspect your signing of your mail with "Anton" was not
intentional :-p),

I assume that's the same as the online demo's "Show LLVM C++ API code"
option (Try out LLVM and Clang in your browser!)? I've tried that with a structure containing
four floating-point components but it also appears to add them individually
using extract/insert. Maybe I have to try an array of floats...

Did you turn off the link-time optimization flag (or something like that)? If
not, the compiler will optimize things like small structs away (though a
struct of more than 3 elements should not be scalarized directly AFAIK...).

Gr.

Matthijs

Hi Matthijs,

Yes, I've turned off the link-time optimizations (otherwise it just
propagates my constant vectors and immediate prints the result). :slight_smile:

Here's essentially what I try to generate:

void add(float z[4], float x[4], float y[4])
{
   z[0] = x[0] + y[0];
   z[1] = x[1] + y[1];
   z[2] = x[2] + y[2];
   z[3] = x[3] + y[3];
}

And here's part of the output from the online demo:

LoadInst* float_tmp2 = new LoadInst(ptr_x, "tmp2", false, label_entry);
LoadInst* float_tmp5 = new LoadInst(ptr_y, "tmp5", false, label_entry);
BinaryOperator* float_tmp6 = BinaryOperator::create(Instruction::Add,
float_tmp2, float_tmp5, "tmp6", label_entry);
StoreInst* void_20 = new StoreInst(float_tmp6, ptr_z, false, label_entry);
GetElementPtrInst* ptr_tmp10 = new GetElementPtrInst(ptr_x, const_int32_13,
"tmp10", label_entry);
LoadInst* float_tmp11 = new LoadInst(ptr_tmp10, "tmp11", false,
label_entry);
GetElementPtrInst* ptr_tmp13 = new GetElementPtrInst(ptr_y, const_int32_13,
"tmp13", label_entry);
LoadInst* float_tmp14 = new LoadInst(ptr_tmp13, "tmp14", false,
label_entry);
BinaryOperator* float_tmp15 = BinaryOperator::create(Instruction::Add,
float_tmp11, float_tmp14, "tmp15", label_entry);
...

So it just processes one element at a time instead of with one (SIMD)
operation.

Thank you,

-Nicolas (not Anton) :stuck_out_tongue:

llvm does not automatically vectorize your scalar code (as least for now). You have to write gcc generic vector code or use vector builtins.

Evan

Nicolas Capens wrote:

Here's essentially what I try to generate:

void add(float z[4], float x[4], float y[4])
{
   z[0] = x[0] + y[0];
   z[1] = x[1] + y[1];
   z[2] = x[2] + y[2];
   z[3] = x[3] + y[3];
}

This is the vectorized llvm-assembly equivalent:

Hi Evan,

Please note that I'm not trying to compile from C code, I try to generate
functions at run-time directly. I want to keep it target-independent too, so
I can't use intrinsics either.

Cheers,

-Nicolas

Hi Frits,

Thanks for the suggestions! I was first able to successfully compile it to
bitcode (.bc format). llc doesn't support "-march=cpp", but then I ran
llvm2cpp which does give me the C++ code to directly create the intermediate
representation. Now I can study that to see what I was doing wrong
earlier...

Thanks again!

-Nicolas

Ah, but you can use any intrinsic that is target independent.... The gcc vector stuff is meant to work on all targets as I recall.

Yes, it does.

FWIW, LLVM IR supports a broad superset of them and has first class support for permutation, insertion, extraction, etc.

-Chris