Vector troubles

Hola LLVMers,

I’m working on engaging SSE via the LLVM vector ops on x86. I had some questions a while back that you all helped out on, but I’m seeing similar issues and was hoping you’d have some ideas. Below is the dump of the LLVM IR of a program which is designed to take a vector stored in a float*, build an LLVM vector from it, copy it to another vector, and then take it apart and store it back out in another float*. This will live on the boundary of our system and would be a function designed to promote a raw, potentially unaligned, value into a vector that the LLVM system can work with a whole bunch.

It is dying trying to store a our working vector into one of the LLVM vectors created on the stack. Despite the align-16 directive on the alloca instruction, it is not always aligning to a 16-byte boundary.

I did a sync and build this morning, so my LLVM is quite fresh.

Thank you for any help!

Chuck.

My program:

target datalayout = “E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32”

define void @promoteCopyAndReturn(float* %promoteReturn, float* %toPromote) {

Entry:

%Promoted_promoteReturn_Ptr = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=2]

%Promoted_toPromote_Ptr = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=2]

%elemPtr = getelementptr float* %toPromote, i32 0 ; <float*> [#uses=1]

%elemLoaded = load float* %elemPtr ; [#uses=1]

%vectorPromotion = insertelement <4 x float> undef, float %elemLoaded, i32 0 ; <<4 x float>> [#uses=1]

%elemPtr1 = getelementptr float* %toPromote, i32 1 ; <float*> [#uses=1]

%elemLoaded2 = load float* %elemPtr1 ; [#uses=1]

%vectorPromotion3 = insertelement <4 x float> %vectorPromotion, float %elemLoaded2, i32 1 ; <<4 x float>> [#uses=1]

%elemPtr4 = getelementptr float* %toPromote, i32 2 ; <float*> [#uses=1]

%elemLoaded5 = load float* %elemPtr4 ; [#uses=1]

%vectorPromotion6 = insertelement <4 x float> %vectorPromotion3, float %elemLoaded5, i32 2 ; <<4 x float>> [#uses=1]

%elemPtr7 = getelementptr float* %toPromote, i32 3 ; <float*> [#uses=1]

%elemLoaded8 = load float* %elemPtr7 ; [#uses=1]

%vectorPromotion9 = insertelement <4 x float> %vectorPromotion6, float %elemLoaded8, i32 3 ; <<4 x float>> [#uses=1]

store <4 x float> %vectorPromotion9, <4 x float>* %Promoted_toPromote_Ptr <<<<<<<<-------- dying when it executes this line (assembly below)

%toPromote10 = load <4 x float>* %Promoted_toPromote_Ptr ; <<4 x float>> [#uses=1]

br label %Body

Body: ; preds = %Entry

store <4 x float> %toPromote10, <4 x float>* %Promoted_promoteReturn_Ptr

br label %Exit

Exit: ; preds = %Body

%vectorToDemote = load <4 x float>* %Promoted_promoteReturn_Ptr ; <<4 x float>> [#uses=4]

%elemToDemote = extractelement <4 x float> %vectorToDemote, i32 0 ; [#uses=1]

%elemPtr11 = getelementptr float* %promoteReturn, i32 0 ; <float*> [#uses=1]

store float %elemToDemote, float* %elemPtr11

%elemToDemote12 = extractelement <4 x float> %vectorToDemote, i32 1 ; [#uses=1]

%elemPtr13 = getelementptr float* %promoteReturn, i32 1 ; <float*> [#uses=1]

store float %elemToDemote12, float* %elemPtr13

%elemToDemote14 = extractelement <4 x float> %vectorToDemote, i32 2 ; [#uses=1]

%elemPtr15 = getelementptr float* %promoteReturn, i32 2 ; <float*> [#uses=1]

store float %elemToDemote14, float* %elemPtr15

%elemToDemote16 = extractelement <4 x float> %vectorToDemote, i32 3 ; [#uses=1]

%elemPtr17 = getelementptr float* %promoteReturn, i32 3 ; <float*> [#uses=1]

store float %elemToDemote16, float* %elemPtr17

ret void

}

Assembler (intel format):

15c00010 83ec2c sub esp,2Ch

15c00013 8b442434 mov eax,dword ptr [esp+34h]

15c00017 f30f10400c movss xmm0,dword ptr [eax+0Ch]

15c0001c f30f104804 movss xmm1,dword ptr [eax+4]

15c00021 0f14c8 unpcklps xmm1,xmm0

15c00024 f30f104008 movss xmm0,dword ptr [eax+8]

15c00029 f30f1010 movss xmm2,dword ptr [eax]

15c0002d 0f14d0 unpcklps xmm2,xmm0

15c00030 0f14d1 unpcklps xmm2,xmm1

15c00033 0f291424 movaps xmmword ptr [esp],xmm2 ss:0023:0012f238=0012f2580122ef730000000100000000

The relevant registers:

Xmm2 8.000000e+000: 4.000000e+000: 2.000000e+000: 1.000000e+000 // the vector got nicely constructed

Esp 12f238 // but it has noplace to go and throws a general-protection exception.

Hola LLVMers,

Hey Chuck,

I’m not certain (Evan and Anton should chime in :), but here is some info:

Two issues with alignment come to mind. First, LLVM has some issues apparently still on systems that don’t have a 16-byte aligned stack: http://llvm.org/bugs/show_bug.cgi?id=1649

The other issue can be that you’re emitting an LLVM load to a pointer that is not on the stack and that doesn’t have the right alignment. In this case, a movaps will be generated and you’ll get a fault. In this case, you can mark the load as having an alignment of one byte, and the codegen will produce movups instead. Using this is generally more efficient than doing 4 scalar loads and insertelements.

It is dying trying to store a our working vector into one of the LLVM vectors created on the stack. Despite the align-16 directive on the alloca instruction, it is not always aligning to a 16-byte boundary.

This sounds like the bugzilla entry.

-Chris

Chuck Rose III wrote:

Hola LLVMers,

Hi Chuck,

It is dying trying to store a our working vector into one of the LLVM vectors created on the stack. Despite the align-16 directive on the alloca instruction, it is not always aligning to a 16-byte boundary.

I also encountered this problem, and temporarily worked around this problem by using the __fastcall calling convention and aligning the stack pointer to a 16 byte boundary just before the function call... i.e something like:

ASSERT(fcnMain->getCallingConv() == llvm::CallingConv::X86_FastCall);
float (__fastcall *fcnMainPtr)(void*) = (float (__fastcall *)(void*))ctx->executionEngine().getPointerToFunction(fcnMain);

void* params = inputParams.get();
u32 oldStackPtr(0);
_asm
{
    mov oldStackPtr, esp
    and esp, 0xfffffff0
}
m_data[i] = fcnMainPtr(params);
_asm
{
    mov esp, oldStackPtr
}

This is clearly not platform independent (and also rather hacky), so a proper fix would be really nice indeed. I currently develop on windows, using MSVC 8.

Cheers,

-- Daniel Johansson