Help me improve two-address code

I have my new port limping enough to compile a very basic function:

int
foo (int a, int b, int c, int d)
{
  return a + b - c + d;
}

clang-cc -O2 yields:

define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) nounwind readnone {
entry:
    %add = add i32 %b, %a ; <i32> [#uses=1]
    %sub = sub i32 %add, %c ; <i32> [#uses=1]
    %add4 = add i32 %sub, %d ; <i32> [#uses=1]
    ret i32 %add4
}

which lowers to this assembler code (note: args arrive in r1..r12, and results are returned in r1..r3.):

foo:
    add r2,r1 ### add r1,r2 is better
    sub r2,r3
    mov r1,r2 ### unnecessary!!
    add r1,r4
    jmp [r30]
    .end foo

The mov insn would be unnecessary if the operand order for the first add were reversed. For this function, GCC does the right thing.

Is there some optimizer knob I'm not turning properly? In more complex cases, GCC does poorly with two-address operand choices and so bloats the code with unnecessary register moves. I have high hopes LLVM can do better, so this result for a simple case is bothersome.

G

I have my new port limping enough to compile a very basic function:

int
foo (int a, int b, int c, int d)
{
return a + b - c + d;
}

clang-cc -O2 yields:

define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) nounwind readnone {
entry:
   %add = add i32 %b, %a ; <i32> [#uses=1]
   %sub = sub i32 %add, %c ; <i32> [#uses=1]
   %add4 = add i32 %sub, %d ; <i32> [#uses=1]
   ret i32 %add4
}

which lowers to this assembler code (note: args arrive in r1..r12, and
results are returned in r1..r3.):

foo:
   add r2,r1 ### add r1,r2 is better
   sub r2,r3
   mov r1,r2 ### unnecessary!!
   add r1,r4
   jmp [r30]
   .end foo

The mov insn would be unnecessary if the operand order for the first add
were reversed. For this function, GCC does the right thing.

Is there some optimizer knob I'm not turning properly? In more complex
cases, GCC does poorly with two-address operand choices and so bloats
the code with unnecessary register moves. I have high hopes LLVM can do
better, so this result for a simple case is bothersome.

Are you marking add as commutable? Are you making mov as a copy instruction?

Evan

Evan Cheng wrote:

  

Is there some optimizer knob I'm not turning properly? In more complex
cases, GCC does poorly with two-address operand choices and so bloats
the code with unnecessary register moves. I have high hopes LLVM can do better, so this result for a simple case is bothersome.
    
Are you marking add as commutable? Are you making mov as a copy instruction?
  
How do I mark them? For the commutative property, I observed this definition:

def add : SDNode<"ISD::ADD" , SDTIntBinOp ,
                        [SDNPCommutative, SDNPAssociative]>;

... and assumed it was sufficient, since I saw no other targets making special arrangements.

I see no obvious (to me, anyway 8^) "copy instruction" property. The insn in question is generated by copyRegToReg(), and satisfies the isMoveInstr() predicate.

G

Hello, Greg

... and assumed it was sufficient, since I saw no other targets making
special arrangements.

You need to mark you instruction as commutative. Almost all targets do
this. Something like this:

let isCommutable = 1 in { // X = ADD Y, Z == X = ADD Z, Y
def ADD16rr : Pseudo<(outs GR16:$dst), (ins GR16:$src1, GR16:$src2),
                     "add.w\t{$src2, $dst|$dst, $src2}",
                     [(set GR16:$dst, (add GR16:$src1, GR16:$src2)),
                      (implicit SR)]>;
}

Evan Cheng wrote:

Is there some optimizer knob I'm not turning properly? In more complex
cases, GCC does poorly with two-address operand choices and so bloats
the code with unnecessary register moves. I have high hopes LLVM
can do better, so this result for a simple case is bothersome.

Are you marking add as commutable? Are you making mov as a copy
instruction?

How do I mark them? For the commutative property, I observed this
definition:

def add : SDNode<"ISD::ADD" , SDTIntBinOp ,
                       [SDNPCommutative, SDNPAssociative]>;

Yes, the "target-independent" ISD::ADD node is commutative. But it doesn't mean the target specific instruction it is selected to has to be commutative.

... and assumed it was sufficient, since I saw no other targets making
special arrangements.

In X86InstrInfo.td, ADD32rr (and lots others) are marked isCommutable.

Evan

Evan Cheng wrote:

In X86InstrInfo.td, ADD32rr (and lots others) are marked isCommutable.

Don't know how I missed that... OK. It's working now. Thanks!