equivalent IR, different asm

The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit.
I suspect the usage of registers is wrong, can someone take a look ?

$ llc opt-pass.ll -o -

  .section __TEXT,__text,regular,pure_instructions
  .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE
  .align 4, 0x90
__ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE: ## @_ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE
## BB#0:
  pushq %r14
  pushq %rbx
  subq $8, %rsp
  movq %rsi, %rbx
  movq %rdi, %r14
  movq %rdx, %rdi
  movq %rcx, %rsi
  callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE
  movq %rax, %rcx
  shrq $32, %rcx
  testl %ecx, %ecx
  je LBB0_2
## BB#1:
  imull (%rbx), %eax
  cltd
  idivl %ecx
  movl %eax, (%r14)
LBB0_2:
  addq $8, %rsp
  popq %rbx
  popq %r14
  ret

$ llc opt-fail.ll -o -

  .section __TEXT,__text,regular,pure_instructions
  .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE
  .align 4, 0x90
__ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE: ## @_ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE
## BB#0:
  pushq %r14
  pushq %rbx
  subq $8, %rsp
  movq %rsi, %rbx
  movq %rdi, %r14
  movq %rdx, %rdi
  movq %rcx, %rsi
  callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE
  movq %rax, %rcx
  shrq $32, %rcx
  testl %ecx, %ecx
  je LBB0_2
## BB#1:
  movl (%rbx), %ecx
  imull %ecx, %eax
  shrq $32, %rax
  movl %eax, %ecx
  cltd
  idivl %ecx
  movl %eax, (%r14)
LBB0_2:
  addq $8, %rsp
  popq %rbx
  popq %r14
  ret

-Argiris

opt-fail.ll (1.77 KB)

opt-pass.ll (1.79 KB)

The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit.
I suspect the usage of registers is wrong, can someone take a look ?

The difference is that there is a shift right after the multiply, before the divide. In IR, the difference is:

  %5 = mul nsw i32 %4, %tmp1 ; <i32> [#uses=1]

  %btmp3 = lshr i64 %1, 32 ; <i64> [#uses=1]
  %btmp4 = trunc i64 %btmp3 to i32 ; <i32> [#uses=1]

  %6 = sdiv i32 %5, %btmp4 ; <i32> [#uses=1]

vs:

  %5 = mul nsw i32 %4, %tmp1 ; <i32> [#uses=1]

  ; removed: %btmp3 = lshr i64 %1, 32 ; <i64> [#uses=1]
  ; removed: %btmp4 = trunc i64 %btmp3 to i32 ; <i32> [#uses=1]

  %6 = sdiv i32 %5, %atmp4 ; <i32> [#uses=1]

It looks like you got these by manually editing the file. Do you think that the shift wasn't supposed to be there in the first place? Can you send me a .ii file that produces the __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE function with steps to get the bad IR?

-Chris

I attached preprocessed files.

$ llvm-g++ gcc-RenderBoxModelObject.ii -fno-exceptions -arch x86_64 -O2 -c -o part.o

vs

$ clang++ clang-RenderBoxModelObject.ii -fno-exceptions -arch x86_64 -O2 -c -o part.o

If I compile with clang, it causes a crash to webkit.

-Argiris

prepro.zip (447 KB)

The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit.
I suspect the usage of registers is wrong, can someone take a look ?

The difference is that there is a shift right after the multiply, before the divide. In IR, the difference is:

But the multiply doesn't feed into the shift. The hand-edited version reuses an earlier computation of
(int32)(%1>>32). Argyrios is right that these should be equivalent; neither of them is bad.

The attached .ll files seem equivalent, but the resulting asm from 'opt-fail.ll' causes a crash to webkit.
I suspect the usage of registers is wrong, can someone take a look ?

Yes, the code here is wrong:

  movl (%rbx), %ecx
  imull %ecx, %eax

This computes h*((int32)%1) in %eax.

  shrq $32, %rax
  movl %eax, %ecx

This is trying to compute (int32)(%1>>32) into %ecx, but is using the wrong input value since %rax has been clobbered by the above code, and further is clobbering the value in %eax computed above, which is implicit input to the divide. This is some kind of back end error, probably register allocator.

Jakob, can you take a look when you get a chance?

-Chris

The instructions look like this before register coalescing:

156L %reg1034<def> = COPY %reg1026:sub_32bit; GR32:%reg1034 GR64:%reg1026
164L %reg1039<def> = MOV32rm %reg1025<kill>, 1, %reg0, 0, %reg0; mem:LD4[%h] GR32:%reg1039 GR64:%reg1025
172L %reg1035<def> = COPY %reg1039<kill>; GR32:%reg1035,1039
180L %reg1035<def> = IMUL32rr %reg1035, %reg1034<kill>, %EFLAGS<imp-def,dead>; GR32:%reg1035,1034
188L %reg1036<def> = COPY %reg1026<kill>; GR64:%reg1036,1026
196L %reg1036<def> = SHR64ri %reg1036, 32, %EFLAGS<imp-def,dead>; GR64:%reg1036
204L %reg1037<def> = COPY %reg1036:sub_32bit<kill>; GR32:%reg1037 GR64:%reg1036
212L %EAX<def> = COPY %reg1035<kill>; GR32:%reg1035

And after:

BB#1: # derived from
164L %reg1035<def> = MOV32rm %reg1025<kill>, 1, %reg0, 0, %reg0; mem:LD4[%h] GR32:%reg1035 GR64:%reg1025
180L %EAX<def> = IMUL32rr %EAX, %reg1035<kill>, %EFLAGS<imp-def,dead>; GR32:%reg1035
196L %RAX<def> = SHR64ri %RAX, 32, %EFLAGS<imp-def,dead>
204L %reg1037<def> = COPY %EAX; GR32:%reg1037

It looks like something got messed up when IMUL32rr was commuted. Possibly because both RAX and EAX are in play.

I'll take a look.

/jakob

This should be fixed by r112751, please verify.

/jakob

Yes, that did it, thanks very much!

-Argiris