Enabling Vector-select

Hello everyone,

I wanted to let everybody know that I am going to enable the support for vector-select by default later today.

Details:

Currently the LLVM code-generator only supports 'select' [1] instructions with a boolean condition. Vectorizing compilers, such as the Intel OpenCL Vectorizer and the GCC vectorizer often use vector-select instructions to implements masks. This change makes code-generation for these patterns possible.

In order to enable vector-select we needed to make some changes to the LLVM type-legalizer.
The '-promote-elements' flag changes the way illegal vectors are legalized. Currently, the default legalization algorithm widens the number of elements in a vector. So, the vector v4i8 would be converted to v16i8. Using the 'promote-element' flag, the legalizer would first try to widen each element. So, the vector v4i8 would be converted to v4i32. Overall this is a good idea because the instruction set is usually more complete for the 'common' element type. This change is required in order to legalize mask types such as '<4 x i1>' into the types which are used by the SSE and Neon instruction sets.

The X86 backend already has excellent codegen support and it lowers vector-select instructions to SSE4 and AVX blends. Other targets emulate blends using a sequence of ANDs and Xors.

Later today I will fix a few tests (which expect a slightly different output) and enable the '-promote-element' flag by default.

[1] http://llvm.org/docs/LangRef.html#i_select
[2] https://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-blend.ll?revision=139992

Thanks,
Nadav

Hi Nadav,

great work, thanks a lot!
I did not have the time to migrate our OpenCL driver to the latest trunk yet, but I followed your commits and tried out some small tests which worked as expected :).

The last thing missing for us now is AVX support in the JIT, but that is a different issue.

However, there is one thing I do not fully understand: what if somebody actually wants a vector of 4 boolean values (i1) that should not be legalized to v4i32? For example, a code generator for LRBni would want to use the architecture's predicate registers for masks, in which case <16 x i1> should probably not be legalized to <16 x i32>, right?
However, I reckon that native support of architectures with predicated execution is probably a bigger problem, anyway.

Best,
Ralf

Hi Ralf,

Thanks for trying the patches! Regarding LRB, it has a special <16 x i1> mask register, so <16 x i1> types would map naturally. As a general rule, when vectorizing, the vectorizing factor should match the width of the machine. But to answer your question, the type legalizer would try to widen <4 x i1> into <4 x 128> (to fill the 512-bit register size), but since i128 is not a legal scalar type, it would fail in doing so, and just widen the vector using the 'vector widening' code.
BTW, floating point types are vector-widended, just like before, and not element-promoted.

Nadav

LLVM now supports vector-select.

Cheers,
Nadav