"equivalent" .ll files diverge after optimizations are applied

Hi,

I've attached 2 .ll files which are supposed to be equivalent but 'unopt-fail.ll' causes a crash in webkit's test suite while 'unopt-pass.ll' does not. I can't give more details about the crash, when I run the crashing test it in isolation it passes, when I run the full suite it crashes; it boggles the mind.

Below I provide the optimized asm that is produced from each file. Could you give a hint on what is the problem ?
I also attached 't.cpp' which approximates the source that the .ll files came from.

-Argiris

unopt-fail.ll (15 KB)

unopt-pass.ll (17.4 KB)

t.cpp (447 Bytes)

Using MM registers is wrong unless the user has specifically asked for it, which doesn't seem to be the case here.
In the awesome MMX architecture, touching an MM register makes subsequent x87 operations fail unless an EMMS instruction is issued first; none of the compilers here are smart enough to insert EMMS instructions in the right places, so the only safe thing is not to use these registers. There is no x87 instruction shown here, but you've probably got one in the full test suite and not in the test by itself, which fits your data.

Why this is happening is not immediately clear. It looks like the successful code is doing an aggregate copy field-by-field while the failing code has lowered this to a memcpy. I would certainly expect the memcpy expansion to be smart enough to avoid using MM registers, though; that's a serious bug if it isn't.

  movd %xmm0, %rax
  movd %rax, %mm0
  movq2dq %mm0, %xmm1
  movq2dq %mm0, %xmm2
  punpcklqdq %xmm2, %xmm1 ## xmm1 = xmm1[0],xmm2[0]
  movq 16(%rsp), %rax
  movd %rax, %mm0
  movq2dq %mm0, %xmm0
  punpcklqdq %xmm2, %xmm0 ## xmm0 = xmm0[0],xmm2[0]

unopt-fail.ll (15 KB)

unopt-pass.ll (17.4 KB)

t.cpp (447 Bytes)

Here's the optimized versions:

$ opt -std-compile-opts unopt-pass.ll -o - | llvm-dis -o -

[...]
define %3 @_ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE(%"class.WebCore::GraphicsContext"* %this, %"struct.WebCore::FloatRect"* %rect) nounwind ssp align 2 {
  %roundedOrigin = alloca %"class.WebCore::FloatSize", align 4 ; <%"class.WebCore::FloatSize"*> [#uses=3]
  %roundedLowerRight = alloca %"class.WebCore::FloatSize", align 4 ; <%"class.WebCore::FloatSize"*> [#uses=3]
  %1 = getelementptr inbounds %"class.WebCore::FloatSize"* %roundedOrigin, i64 0, i32 0 ; <float*> [#uses=2]
  store float 0.000000e+00, float* %1, align 4
  %2 = getelementptr inbounds %"class.WebCore::FloatSize"* %roundedOrigin, i64 0, i32 1 ; <float*> [#uses=2]
  store float 0.000000e+00, float* %2, align 4
  %3 = getelementptr inbounds %"class.WebCore::FloatSize"* %roundedLowerRight, i64 0, i32 0 ; <float*> [#uses=2]
  store float 0.000000e+00, float* %3, align 4
  %4 = getelementptr inbounds %"class.WebCore::FloatSize"* %roundedLowerRight, i64 0, i32 1 ; <float*> [#uses=2]
  store float 0.000000e+00, float* %4, align 4
  %5 = getelementptr inbounds %"class.WebCore::GraphicsContext"* %this, i64 0, i32 1 ; <%"class.WebCore::GraphicsContextPlatformPrivate"**> [#uses=1]
  %6 = load %"class.WebCore::GraphicsContextPlatformPrivate"** %5, align 8 ; <%"class.WebCore::GraphicsContextPlatformPrivate"*> [#uses=1]
  call void @_ZN7WebCore5mouniEPNS_15GraphicsContextEPNS_30GraphicsContextPlatformPrivateERKNS_9FloatRectERNS_10FloatPointES8_(%"class.WebCore::GraphicsContext"* %this, %"class.WebCore::GraphicsContextPlatformPrivate"* %6, %"struct.WebCore::FloatRect"* %rect, %"class.WebCore::FloatSize"* %roundedOrigin, %"class.WebCore::FloatSize"* %roundedLowerRight) nounwind
  %7 = load float* %3, align 4 ; <float> [#uses=1]
  %8 = load float* %1, align 4 ; <float> [#uses=2]
  %9 = fsub float %7, %8 ; <float> [#uses=1]
  %10 = load float* %4, align 4 ; <float> [#uses=1]
  %11 = load float* %2, align 4 ; <float> [#uses=2]
  %12 = fsub float %10, %11 ; <float> [#uses=1]
  %13 = insertelement <2 x float> undef, float %8, i32 0 ; <<2 x float>> [#uses=1]
  %14 = insertelement <2 x float> %13, float %11, i32 1 ; <<2 x float>> [#uses=1]
  %tmp8 = insertvalue %3 undef, <2 x float> %14, 0 ; <%3> [#uses=1]
  %15 = insertelement <2 x float> undef, float %9, i32 0 ; <<2 x float>> [#uses=1]
  %16 = insertelement <2 x float> %15, float %12, i32 1 ; <<2 x float>> [#uses=1]
  %tmp12 = insertvalue %3 %tmp8, <2 x float> %16, 1 ; <%3> [#uses=1]
  ret %3 %tmp12
}

$ opt -std-compile-opts unopt-fail.ll -o - | llvm-dis -o -

[...]
define %3 @_ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE(%"class.WebCore::GraphicsContext"* %this, %"struct.WebCore::FloatRect"* %rect) nounwind ssp align 2 {
  %roundedOrigin = alloca i64, align 8 ; <i64*> [#uses=3]
  %tmpcast = bitcast i64* %roundedOrigin to %"class.WebCore::FloatSize"* ; <%"class.WebCore::FloatSize"*> [#uses=2]
  %roundedLowerRight = alloca %"class.WebCore::FloatSize", align 4 ; <%"class.WebCore::FloatSize"*> [#uses=3]
  %1 = bitcast i64* %roundedOrigin to float* ; <float*> [#uses=2]
  store float 0.000000e+00, float* %1, align 8
  %2 = getelementptr inbounds %"class.WebCore::FloatSize"* %tmpcast, i64 0, i32 1 ; <float*> [#uses=2]
  store float 0.000000e+00, float* %2, align 4
  %3 = getelementptr inbounds %"class.WebCore::FloatSize"* %roundedLowerRight, i64 0, i32 0 ; <float*> [#uses=2]
  store float 0.000000e+00, float* %3, align 4
  %4 = getelementptr inbounds %"class.WebCore::FloatSize"* %roundedLowerRight, i64 0, i32 1 ; <float*> [#uses=2]
  store float 0.000000e+00, float* %4, align 4
  %5 = getelementptr inbounds %"class.WebCore::GraphicsContext"* %this, i64 0, i32 1 ; <%"class.WebCore::GraphicsContextPlatformPrivate"**> [#uses=1]
  %6 = load %"class.WebCore::GraphicsContextPlatformPrivate"** %5, align 8 ; <%"class.WebCore::GraphicsContextPlatformPrivate"*> [#uses=1]
  call void @_ZN7WebCore5mouniEPNS_15GraphicsContextEPNS_30GraphicsContextPlatformPrivateERKNS_9FloatRectERNS_10FloatPointES8_(%"class.WebCore::GraphicsContext"* %this, %"class.WebCore::GraphicsContextPlatformPrivate"* %6, %"struct.WebCore::FloatRect"* %rect, %"class.WebCore::FloatSize"* %tmpcast, %"class.WebCore::FloatSize"* %roundedLowerRight) nounwind
  %7 = load float* %3, align 4 ; <float> [#uses=1]
  %8 = load float* %1, align 8 ; <float> [#uses=1]
  %9 = fsub float %7, %8 ; <float> [#uses=1]
  %10 = load float* %4, align 4 ; <float> [#uses=1]
  %11 = load float* %2, align 4 ; <float> [#uses=1]
  %12 = fsub float %10, %11 ; <float> [#uses=1]
  %tmp3.i = insertelement <2 x float> undef, float %9, i32 0 ; <<2 x float>> [#uses=1]
  %tmp1.i = insertelement <2 x float> %tmp3.i, float %12, i32 1 ; <<2 x float>> [#uses=1]
  %tmp = bitcast <2 x float> %tmp1.i to i64 ; <i64> [#uses=1]
  %tmp.i.i = load i64* %roundedOrigin, align 8 ; <i64> [#uses=1]
  %tmp9 = insertelement <1 x i64> undef, i64 %tmp.i.i, i32 0 ; <<1 x i64>> [#uses=1]
  %tmp6 = insertelement <1 x i64> undef, i64 %tmp, i32 0 ; <<1 x i64>> [#uses=1]
  %tmp11 = bitcast <1 x i64> %tmp9 to <2 x float> ; <<2 x float>> [#uses=1]
  %insert = insertvalue %3 undef, <2 x float> %tmp11, 0 ; <%3> [#uses=1]
  %tmp8 = bitcast <1 x i64> %tmp6 to <2 x float> ; <<2 x float>> [#uses=1]
  %insert4 = insertvalue %3 %insert, <2 x float> %tmp8, 1 ; <%3> [#uses=1]
  ret %3 %insert4
}

Just to be clear, are you saying that the fact that, after using llc on the second IR, the produced asm is using MM registers, indicates a bug ?

-Argiris

Just to be clear, are you saying that the fact that, after using llc on the second IR, the produced asm is using MM registers, indicates a bug ?

Yes. It's not immediately obvious whether it's in the opt or llc, though.
Chris was doing work involving <2 x float> and may know about this.

I did. <2 x float> doesn't use MMX, but <2 x int> probably does. It is possible that hte optimizer is turning <2 x float> operations into <2 x int> ones or something...

-Chris

Hi Argiris,

The real problem here is that the X86 backend is turning datatypes like <1 x i64> into MMX operations, but doesn't do so in a safe way (it's not inserting the requisite EMMS instructions). After discussing this with Dale and Bill, the right fix is to stop mapping generic vectors onto MMX operations. This will define away the existing -disable-mmx flag and make stuff like this impossible.

However, this isn't going to happen in the next couple days, certainly not in time for the 2.8 release branch on friday. As such, I checked in a horrible hack in r112696 that prevents SRoA from introducing mmx specific vector types. I'm not aware of a target where those datatypes are actually useful, so this shouldn't be bad. On your testcase, no mmx operations are produced.

Please let me know if you see any other MMX stuff being generated.

-Chris

Thanks for the quick response!

Those types are all used with NEON. I'll try to do some experiments to measure the impact of this....

If it matters, I can conditionalize the hack on the target triple being i386/x86-64. Just let me know.

-Chris

Some simple test programs look fine.

It also turns out that my nightly tester picked up this change last night. I didn't see any obvious problems there except for a 13.5% regression on one internal test (<rdar://problem/8383253>) but that regression still reproduces with this change reverted. I sure don't want that hack to stick around for long, but as long as it gets removed soon, I guess we're fine for now.

Those types are all used with NEON. I'll try to do some experiments to measure the impact of this....

If it matters, I can conditionalize the hack on the target triple being i386/x86-64. Just let me know.

Some simple test programs look fine.

It also turns out that my nightly tester picked up this change last night. I didn't see any obvious problems there except for a 13.5% regression on one internal test (<rdar://problem/8383253>) but that regression still reproduces with this change reverted. I sure don't want that hack to stick around for long, but as long as it gets removed soon, I guess we're fine for now.

Just to be safe, I made this only kick in on x86: r112763

-Chris