removing unnecessary moves with 2-address machine

I am new to LLVM and am trying to write a backend for a simple 2-address machine.
For a very simple program:

define i16 @add2aa(i16 %a, i16 %b) nounwind {
entry:
        %tmp3 = add i16 %b, %a ; <i16> [#uses=1]
        ret i16 %tmp3
}

The arguments a and b are passed in r15 and r14. The result is returned in r15.
The generated code is:

add2aa:
        add.w r15,r14
        mov.w r14,r15
        ret

If the legs of the add.w were reversed, the mov.w would not be necessary. I have declared the add instruction to be "commutable". What do I have to do to cause the optimization?

thanks,
bagel

I'm using released 2.2. I guess I need to check out the svn trunk to get this enhancement.

bagel

Marking it commutable and describing the move as a copy instruction should be enough. Are you using llvm mainline or 2.2? Evan implemented this in mainline after 2.2 was released.

-Chris