Hello LLVMers,
High order bit:
Presence of a called function is causing a store on an unrelated vector to generate an aligned store rather an unaligned one despite unaligned store being indicated in the associated StoreInst.
Details:
I pulled down the latest source, so this is something I’m finding with the current LLVM. I’m hoping you’ll have an idea what’s going on or at least know if it’s a new issue I should log. It’s related to the stack alignment issue that I know is being worked on, but seems sufficiently different to ask about it here. I checked the bug database for “align” and “movaps” and didn’t see this issue raised.
Ok, the first bit of code here seems to generate correct assembly for me. Basically, it copies the float4 stored at globalV and copies it into the address pointed to by dependentV. Along the way, it creates a <4 x float> and copies globalV into a temporary. I’m working on bridging the gap between the outside of our system and the LLVM generated code, so there is a little extra copying from and to parameters at the boundaries of this function. Since this is just a repro-example, there is very little besides the boundaries here. J I fully admit the constructions below may not be optimal.
; ModuleID = ‘hydra’
target datalayout = “E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32”
define void @evaluateDependents(float* %dependentV, float* %globalV) {
Entry_evaluateDependents:
%Promoted_dependentV_Ptr = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=2]
%Promoted_globalV_Ptr = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=2]
%externalVectorPtrCast = bitcast float* %globalV to <4 x float>* ; <<4 x float>*> [#uses=1]
%externalVectorLoaded = load <4 x float>* %externalVectorPtrCast, align 1 ; <<4 x float>> [#uses=1]
store <4 x float> %externalVectorLoaded, <4 x float>* %Promoted_globalV_Ptr, align 1
%globalV1 = load <4 x float>* %Promoted_globalV_Ptr, align 1 ; <<4 x float>> [#uses=1]
br label %Body_evaluateDependents
Body_evaluateDependents: ; preds = %Entry_evaluateDependents
store <4 x float> %globalV1, <4 x float>* %Promoted_dependentV_Ptr, align 1
br label %Exit_evaluateDependents
Exit_evaluateDependents: ; preds = %Body_evaluateDependents
%vectorToDemote = load <4 x float>* %Promoted_dependentV_Ptr, align 1 ; <<4 x float>> [#uses=1]
%externalVectorPtrCast2 = bitcast float* %dependentV to <4 x float>* ; <<4 x float>*> [#uses=1]
store <4 x float> %vectorToDemote, <4 x float>* %externalVectorPtrCast2, align 1
ret void
}
Produces these instructions which obeys all the align 1 directives on the LoadInsts and StoreInsts…
…
15D10010 sub esp,2Ch
15D10013 mov eax,dword ptr [esp+34h]
15D10017 movups xmm0,xmmword ptr [eax]
15D1001A movups xmmword ptr [esp],xmm0
15D1001E mov eax,dword ptr [esp+30h]
15D10022 movups xmmword ptr [esp+10h],xmm0
15D10027 movups xmm0,xmmword ptr [esp+10h]
15D1002C movups xmmword ptr [eax],xmm0
15D1002F add esp,2Ch
15D10032 ret
Here’s where it gets weird and confusing to me. Let’s make our evaluateDependents function do something else. In addition to copying globalV into dependentV, it’s also going to set a singleton float pointed to by dependentF. We’ll call a function foo to get the value. (I tried setting dependentF directly and that did NOT cause the problem with the generated code). Here’s the LLVM code:
; ModuleID = ‘hydra’
target datalayout = “E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32”
define float @foo(float %Y) {
Entry_foo:
%_ReturnValuePtr = alloca float ; <float*> [#uses=2]
br label %Body_foo
Body_foo: ; preds = %Entry_foo
store float %Y, float* %_ReturnValuePtr, align 1
br label %Exit_foo
Exit_foo: ; preds = %Body_foo
%finalValue = load float* %_ReturnValuePtr, align 1 ; [#uses=1]
ret float %finalValue
}
define void @evaluateDependents(float* %dependentF, float* %dependentV, float* %globalV) {
Entry_evaluateDependents:
%Promoted_dependentV_Ptr = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=2]
%Promoted_globalV_Ptr = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=2]
%externalVectorPtrCast = bitcast float* %globalV to <4 x float>* ; <<4 x float>*> [#uses=1]
%externalVectorLoaded = load <4 x float>* %externalVectorPtrCast, align 1 ; <<4 x float>> [#uses=1]
store <4 x float> %externalVectorLoaded, <4 x float>* %Promoted_globalV_Ptr, align 1
%globalV1 = load <4 x float>* %Promoted_globalV_Ptr, align 1 ; <<4 x float>> [#uses=1]
br label %Body_evaluateDependents
Body_evaluateDependents: ; preds = %Entry_evaluateDependents
%fooResult = call float @foo( float 2.000000e+000 ) ; [#uses=1]
store float %fooResult, float* %dependentF, align 1
store <4 x float> %globalV1, <4 x float>* %Promoted_dependentV_Ptr, align 1
br label %Exit_evaluateDependents
Exit_evaluateDependents: ; preds = %Body_evaluateDependents
%vectorToDemote = load <4 x float>* %Promoted_dependentV_Ptr, align 1 ; <<4 x float>> [#uses=1]
%externalVectorPtrCast2 = bitcast float* %dependentV to <4 x float>* ; <<4 x float>*> [#uses=1]
store <4 x float> %vectorToDemote, <4 x float>* %externalVectorPtrCast2, align 1
ret void
}
Here are the instructions for evaluateDependents. The JITter hasn’t compiled foo yet. What’s confusing to me is why did my movups suddenly become a movaps? All the stores and loads have align 1 on them.
…
15D10012 sub esp,4Ch
15D10015 mov eax,dword ptr [esp+60h]
15D10019 movups xmm0,xmmword ptr [eax]
15D1001C movaps xmmword ptr [esp+8],xmm0 ß why did this become a movaps?
15D10021 movups xmmword ptr [esp+28h],xmm0
15D10026 mov esi,dword ptr [esp+58h]
15D1002A mov edi,dword ptr [esp+5Ch]
15D1002E mov dword ptr [esp],40000000h
15D10035 call X86CompilationCallback (1335030h)
Thanks for the help!
Chuck.