Missed optimization opportunity in 3-way integer comparison case

While looking at what llvm writes for this testcase, I noticed that there is one redundant operation in resulting assembly. The second 'cmp' operation there is essentially identical to the first one, with reversed order of arguments. Therefore, it is not needed.

This testcase is a simple integer comparison routine, similar to what qsort would take to sort an integer array.

I think llvm should be taking advantage of the preceding instruction and not placing the similar instruction again.

rev.208525, optimization level 3.

Yuri

--- C-style original code ---
int mycmp (int i1, int i2) {
     if (i1<i2) {
       return -1;
     } else if (i1>i2) {
       return 1;
     }
     return 0;
}

--- llvm code ---
define i32 @mycmp(i32, i32) #0 {
lbl0:
   %icmp.ULT = icmp ult i32 %0, %1
   br i1 %icmp.ULT, label %lbl1, label %lbl2

lbl1:
   %merge = phi i32 [ -1, %lbl0 ], [ %., %lbl2 ]
   ret i32 %merge

lbl2:
   %icmp.UGT = icmp ugt i32 %0, %1
   %. = zext i1 %icmp.UGT to i32
   br label %lbl1
}

--- intel assembly ---
0000000000000010 <mycmp>:
   10: 55 push %rbp
   11: 48 89 e5 mov %rsp,%rbp
   14: b8 ff ff ff ff mov $0xffffffff,%eax
   19: 39 f7 cmp %esi,%edi
   1b: 72 07 jb 24 <mycmp+0x14>
   1d: 39 fe cmp %edi,%esi <==== !!! REDUNDANT COMPARISON !!!
   1f: 19 c0 sbb %eax,%eax
   21: 83 e0 01 and $0x1,%eax
   24: 5d pop %rbp
   25: c3 retq

My copy only uses one cmp if you use int as in your C sample code. It produces like your llvm and x86 code if you use unsigned. I don't think it has any choice.

Bruce,

Yes, my original example had an unsigned comparison, and clang produces to 'cmp' instructions only in unsigned case. However, in both signed and unsigned cases it is possible to compare with only one 'cmp' instruction.

Here is the procedure that does such comparison with one 'cmp' in the case of unsigned arguments:
         .text
         .file "cmp.c"
         .globl mycmp
         .align 16, 0x90
         .type mycmp,@function
mycmp:
         pushq %rbp
         movq %rsp, %rbp
         movl $-1, %eax
         cmpl %edi, %esi
         ja .LBB0_2
         sbbl %eax, %eax
         andl $1, %eax
.LBB0_2:
         popq %rbp
         retq
.Ltmp3:
         .size mycmp, .Ltmp3-mycmp

So clang does miss this optimization opportunity, but only in unsigned case.

Yuri

I created this bug report so that this issue won't get lost: http://llvm.org/bugs/show_bug.cgi?id=19758

Yuri