Adding "S" suffixed ARM/Thumb2 instructions

Hello everyone,

I’ve added the “S” suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, “movs” or “muls”.
Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched.
Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like

orr r1, r2, r3 ----> orrs r1, r2, r3
cmp r1, 0

This optimization has shown nice acceleration, e.g. 3.3% in SQLite on CortexA8 and works fine.
I have some questions though.

1)“neverHasSideEffects” in tablegen means that CPSR is not implicitly defined, doesn’t it?
2)What else can be done using that super “S” power?
3)Current optimization implementation works similar to peephole (peephole pitiful cmp optimization was disabled),
right before ifcvt. Should I raise it up somewhere? What do you think is the right place for such thing?
4)Consider the following C code:

int a, b, c;

a = b * c;
if (a > 0) { … }

One gets the corresponding ARM assembler

mul r(a), r(b), r(c)
cmp r(a), 1
blt LABEL

// r(x) is the register where x is

The other cases (“if (a == 0)”, “if (a < 0)”) produce expected

cmp r(a), 0

So what is the hidden idea of this resultant comparison with 1?
Where should I look for the code behind that logic?

Thanks,
Vadim Markovtsev,
ISP RAS.

Adding separate "s" instructions is not the right thing to do. We've been trying hard to avoid adding those "twins". The instructions that can optionally set the condition codes have an "optional def" operand. For example, look at the "cc_out" operand in the "sI" class defined in ARMInstrFormats.td. If that operand is set to the CPSR register, then the instruction becomes the "s" variant.

There are some existing peephole optimizations to make use of this, but there are some unresolved issues as well. Do you have some example testcases that show where we're missing opportunities?

I’ve just revised the current LLVM trunk.

Adding separate “s” instructions is not the right thing to do. We’ve been trying hard to avoid adding those “twins”. The instructions that can optionally set the condition >>codes have an “optional def” operand. For example, look at the “cc_out” operand in the “sI” class defined in ARMInstrFormats.td. If that operand is set to the CPSR >>register, then the instruction becomes the “s” variant.

Alright, but everything is not so shiny as one may expect. For example, when I set “mov” instruction to define CPSR, generated assembler is still “mov”, not “movs”. “movs” is absolutely correct instruction which sets CPSR. The same operation on “add” brings the desired effect. So, if one should go the way you propose instead of adding separate instructions to tablegen, what he or she has to modify in LLVM code to resolve such issues? There are lots of similar instructions unsupported by LLVM which SURE HAVE a suffixed twin.

There are some existing peephole optimizations to make use of this, but there are some unresolved issues as well. Do you have some example testcases that show where >>we’re missing opportunities?

Oh yeah. Consider the following existing peephole optimization:
PeepholeOptimizer.cpp->PeepholeOptimizer::OptimizeCmpInstr->ARMBaseInstrInfo::OptimizeCompareInstr.

case ARM::ADDri:
case ARM::ANDri:
case ARM::t2ANDri:
case ARM::SUBri:
case ARM::t2ADDri:
case ARM::t2SUBri:

// Toggle the optional operand to CPSR.
MI->getOperand(5).setReg(ARM::CPSR);
MI->getOperand(5).setIsDef(true);
CmpInstr->eraseFromParent();
return true;

…and that’s all, however this switch should be giant (88 instructions instead of 6 can be supported so far). Yet another question unclear to me is what the origin of the comment above

// Set the “zero” bit in CPSR.

is. Why not also “negative”?
Moreover, that peephole thing particularly can be dramatically improved with some advanced analysis.
For example, consider the following program:

#include <stdio.h>

int main()
{
srand(time(NULL));
int x, y;
x = rand();
y = rand();
int z = x * y;
if (z == 0)
{
printf(“Zero”);
}
z = x|y;
if (z > 0)
{
printf(“Greater”);
}
else
{
printf(“Smaller”);
}
return 0;
}

It compiles to

.syntax unified
.cpu cortex-a8
.eabi_attribute 6, 10
.eabi_attribute 7, 65
.eabi_attribute 8, 1
.eabi_attribute 9, 2
.eabi_attribute 10, 2
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.file “test.bc”
.text
.globl main
.align 2
.type main,%function
main: @ @main
@ BB#0: @ %entry
push {r4, r5, r11, lr}
mov r0, #0
bl time
bl srand
bl rand
mov r4, r0
bl rand
mov r5, r0
mul r0, r5, r4
cmp r0, #0
bne .LBB0_2
@ BB#1: @ %bb
movw r0, :lower16:.L.str
movt r0, :upper16:.L.str
bl printf
.LBB0_2: @ %bb1
orr r0, r5, r4
cmp r0, #1
blt .LBB0_5
@ BB#3: @ %bb2
movw r0, :lower16:.L.str1
movt r0, :upper16:.L.str1
.LBB0_4: @ %bb2
bl printf
mov r0, #0
ldmia sp!, {r4, r5, r11, pc}
.LBB0_5: @ %bb3
movw r0, :lower16:.L.str2
movt r0, :upper16:.L.str2
b .LBB0_4
.Ltmp0:
.size main, .Ltmp0-main

.type .L.str,%object @ @.str
.section .rodata,“a”,%progbits
.align 2
.L.str:
.asciz “Zero”
.size .L.str, 5

.type .L.str1,%object @ @.str1
.align 2
.L.str1:
.asciz “Greater”
.size .L.str1, 8

.type .L.str2,%object @ @.str2
.align 2
.L.str2:
.asciz “Smaller”
.size .L.str2, 8

At the same time, my optimization produces

.syntax unified
.cpu cortex-a8
.eabi_attribute 6, 10
.eabi_attribute 7, 65
.eabi_attribute 8, 1
.eabi_attribute 9, 2
.eabi_attribute 10, 2
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.file “test.bc”
.text
.globl main
.align 2
.type main,%function
main: @ @main
@ BB#0: @ %entry
push {r4, r5, r11, lr}
mov r0, #0
bl time
bl srand
bl rand
mov r4, r0
bl rand
mov r5, r0
muls r0, r5, r4
bne .LBB0_2
@ BB#1: @ %bb
movw r0, :lower16:.L.str
movt r0, :upper16:.L.str
bl printf
.LBB0_2: @ %bb1
orrs r0, r5, r4
ble .LBB0_5
@ BB#3: @ %bb2
movw r0, :lower16:.L.str1
movt r0, :upper16:.L.str1
.LBB0_4: @ %bb2
bl printf
mov r0, #0
ldmia sp!, {r4, r5, r11, pc}
.LBB0_5: @ %bb3
movw r0, :lower16:.L.str2
movt r0, :upper16:.L.str2
b .LBB0_4
.Ltmp0:
.size main, .Ltmp0-main

.type .L.str,%object @ @.str
.section .rodata,“a”,%progbits
.align 2
.L.str:
.asciz “Zero”
.size .L.str, 5

.type .L.str1,%object @ @.str1
.align 2
.L.str1:
.asciz “Greater”
.size .L.str1, 8

.type .L.str2,%object @ @.str2
.align 2
.L.str2:
.asciz “Smaller”
.size .L.str2, 8

You should pay attention to “muls” instead of “mul” (lack of support) and “orrs” instead of “orr” (advanced analysis).