I’ve just revised the current LLVM trunk.
Adding separate “s” instructions is not the right thing to do. We’ve been trying hard to avoid adding those “twins”. The instructions that can optionally set the condition >>codes have an “optional def” operand. For example, look at the “cc_out” operand in the “sI” class defined in ARMInstrFormats.td. If that operand is set to the CPSR >>register, then the instruction becomes the “s” variant.
Alright, but everything is not so shiny as one may expect. For example, when I set “mov” instruction to define CPSR, generated assembler is still “mov”, not “movs”. “movs” is absolutely correct instruction which sets CPSR. The same operation on “add” brings the desired effect. So, if one should go the way you propose instead of adding separate instructions to tablegen, what he or she has to modify in LLVM code to resolve such issues? There are lots of similar instructions unsupported by LLVM which SURE HAVE a suffixed twin.
There are some existing peephole optimizations to make use of this, but there are some unresolved issues as well. Do you have some example testcases that show where >>we’re missing opportunities?
Oh yeah. Consider the following existing peephole optimization:
PeepholeOptimizer.cpp->PeepholeOptimizer::OptimizeCmpInstr->ARMBaseInstrInfo::OptimizeCompareInstr.
case ARM::ADDri:
case ARM::ANDri:
case ARM::t2ANDri:
case ARM::SUBri:
case ARM::t2ADDri:
case ARM::t2SUBri:
// Toggle the optional operand to CPSR.
MI->getOperand(5).setReg(ARM::CPSR);
MI->getOperand(5).setIsDef(true);
CmpInstr->eraseFromParent();
return true;
…and that’s all, however this switch should be giant (88 instructions instead of 6 can be supported so far). Yet another question unclear to me is what the origin of the comment above
// Set the “zero” bit in CPSR.
is. Why not also “negative”?
Moreover, that peephole thing particularly can be dramatically improved with some advanced analysis.
For example, consider the following program:
#include <stdio.h>
int main()
{
srand(time(NULL));
int x, y;
x = rand();
y = rand();
int z = x * y;
if (z == 0)
{
printf(“Zero”);
}
z = x|y;
if (z > 0)
{
printf(“Greater”);
}
else
{
printf(“Smaller”);
}
return 0;
}
It compiles to
.syntax unified
.cpu cortex-a8
.eabi_attribute 6, 10
.eabi_attribute 7, 65
.eabi_attribute 8, 1
.eabi_attribute 9, 2
.eabi_attribute 10, 2
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.file “test.bc”
.text
.globl main
.align 2
.type main,%function
main: @ @main
@ BB#0: @ %entry
push {r4, r5, r11, lr}
mov r0, #0
bl time
bl srand
bl rand
mov r4, r0
bl rand
mov r5, r0
mul r0, r5, r4
cmp r0, #0
bne .LBB0_2
@ BB#1: @ %bb
movw r0, :lower16:.L.str
movt r0, :upper16:.L.str
bl printf
.LBB0_2: @ %bb1
orr r0, r5, r4
cmp r0, #1
blt .LBB0_5
@ BB#3: @ %bb2
movw r0, :lower16:.L.str1
movt r0, :upper16:.L.str1
.LBB0_4: @ %bb2
bl printf
mov r0, #0
ldmia sp!, {r4, r5, r11, pc}
.LBB0_5: @ %bb3
movw r0, :lower16:.L.str2
movt r0, :upper16:.L.str2
b .LBB0_4
.Ltmp0:
.size main, .Ltmp0-main
.type .L.str,%object @ @.str
.section .rodata,“a”,%progbits
.align 2
.L.str:
.asciz “Zero”
.size .L.str, 5
.type .L.str1,%object @ @.str1
.align 2
.L.str1:
.asciz “Greater”
.size .L.str1, 8
.type .L.str2,%object @ @.str2
.align 2
.L.str2:
.asciz “Smaller”
.size .L.str2, 8
At the same time, my optimization produces
.syntax unified
.cpu cortex-a8
.eabi_attribute 6, 10
.eabi_attribute 7, 65
.eabi_attribute 8, 1
.eabi_attribute 9, 2
.eabi_attribute 10, 2
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.file “test.bc”
.text
.globl main
.align 2
.type main,%function
main: @ @main
@ BB#0: @ %entry
push {r4, r5, r11, lr}
mov r0, #0
bl time
bl srand
bl rand
mov r4, r0
bl rand
mov r5, r0
muls r0, r5, r4
bne .LBB0_2
@ BB#1: @ %bb
movw r0, :lower16:.L.str
movt r0, :upper16:.L.str
bl printf
.LBB0_2: @ %bb1
orrs r0, r5, r4
ble .LBB0_5
@ BB#3: @ %bb2
movw r0, :lower16:.L.str1
movt r0, :upper16:.L.str1
.LBB0_4: @ %bb2
bl printf
mov r0, #0
ldmia sp!, {r4, r5, r11, pc}
.LBB0_5: @ %bb3
movw r0, :lower16:.L.str2
movt r0, :upper16:.L.str2
b .LBB0_4
.Ltmp0:
.size main, .Ltmp0-main
.type .L.str,%object @ @.str
.section .rodata,“a”,%progbits
.align 2
.L.str:
.asciz “Zero”
.size .L.str, 5
.type .L.str1,%object @ @.str1
.align 2
.L.str1:
.asciz “Greater”
.size .L.str1, 8
.type .L.str2,%object @ @.str2
.align 2
.L.str2:
.asciz “Smaller”
.size .L.str2, 8
You should pay attention to “muls” instead of “mul” (lack of support) and “orrs” instead of “orr” (advanced analysis).