IR optimization pass ideas for backend porting before ISel

Hi LLVMers,

I’m writing a LLVM backend for CCore, an ISA derived from Motorola MCore. I was wondering if someone wrote some IR level optimization passes for backend porting before ISel, such as an IR transformation from GEP to integer conversion/calculating instructions, and PHI combination. Here’s the bubble sorting example. The IR codes below are changed by hand and I try to write passes to do such modifications automatically.

C codes (bubbleSort.c):

void bubbleSort(int *arr, int n) {

int i, j;

int sz = n - 1;

for (i = 0; i < sz; i++)

for (j = 0; j < sz - i; j++)

if(*(arr+j) > *(arr+j+1)) {

int t = *(arr+j);

*(arr+j) = *(arr+j+1);

*(arr+j+1) = t;

}

}

Part of assemble codes (bubbleSort-gcc-O2.s) from GCC M*Core backend with -O2:

.L3:

cmplti r3,1

jbt .L6

mov r7,r2

movi r6,0

.L5:

ldw r5,(r7)

ldw r4,(r7,4)

.L4:

addi r6,1

addi r7,4

cmpne r3,r6

jbt .L5

The address calculating formula for such code is
N(0) = arr;

N(n) = N(n-1) + ElementSize, n>= 1

which r2 stands for arr, r7 stands for the next address, and ElementSize of int type is 4.

However, LLVM GEP adopts a rule as N(n) = arr + n * ElementSize, and may produce several instructions to compute the address.

Here’s IR codes (bubbleSort-O3.ll) generated by Clang -O3:

for.body4:

%j.018 = phi i32 [ 0, %for.body4.lr.ph ], [ %add.ptr.sum, %for.inc ]

%add.ptr.sum = add i32 %j.018, 1

%add.ptr6 = getelementptr inbounds i32* %arr, i32 %add.ptr.sum

%1 = load i32* %add.ptr6, align 4, !tbaa !0

and thus generated assembly codes (bubbleSort-mcore-O3.s) by llc -march=mcore as following:

mov r13,r6

lsli r13, 2

mov r14,r2

addu r14,r13

ldw r13, (r14, 4)

in which r6 stands for %j.018, r2 stands for arr, and lsli N means left logic shift N bits.

(It may be meaningless, but I don’t know a better way to get good codes as GCC)
If I modified IR(bubbleSort-cast-1.ll) by transforming GEP to integer instructions and calculating the address as below, and I could get codes(bubbleSort-mcore-cast-1.s) as same as GCC shown above.

for.body4.lr.ph:

%base = ptrtoint i32* %arr to i32

for.body4:

%cast = phi i32 [ %base, %for.body4.lr.ph ], [ %cast.next, %for.inc ]

%ptr = inttoptr i32 %cast to i32*

%add.ptr6 = getelementptr inbounds i32* %ptr, i32 1

%1 = load i32* %add.ptr6, align 4, !tbaa !0

for.inc:

%cast.next = add i32 %cast, 4

The idea for such GEP cast pass is

  1. check the fourth operand of GEP () whether a binary instruction such as add or sub (go 2)), or not (return).
  2. create the pointer-to-int instructions in the “preheader” block(%for.body4.lr.ph) and a step-by-step instruction in the next block (%for.inc).
  3. create a PHI instruction and a int-to-pointer instruction in the current block.
  4. replace the old GEP with the new GEP.

Next, I may reduce IR variables like %i.020, %sub2, %inc15 in bubbleSort-cast-2.ll and obtain better codes shown in bubbleSort-mcore-cast-2.s.

The yellow codes are the original IR, and the red are modified ones:

for.cond1.preheader: ; preds = %entry, %for.inc14

%indvars.iv = phi i32 [ %indvars.iv.next, %for.inc14 ], [ %sub, %entry ]

;%i.020 = phi i32 [ %inc15, %for.inc14 ], [ 0, %entry ]

;%sub2 = sub nsw i32 %sub, %i.020

;%cmp317 = icmp sgt i32 %sub2, 0

%cmp317 = icmp sgt i32 %indvars.iv, 0

br i1 %cmp317, label %for.body4.lr.ph, label %for.inc14

for.inc14: ; preds = %for.inc, %for.cond1.preheader

;%inc15 = add nsw i32 %i.020, 1

%indvars.iv.next = add i32 %indvars.iv, -1

;%exitcond21 = icmp eq i32 %inc15, %sub

%exitcond21 = icmp eq i32 %indvars.iv.next, 0

br i1 %exitcond21, label %for.end16, label %for.cond1.preheader

I think, the condition for applying such PHI optimization are

  1. the same entry points for several PHI instructions in one basic block, such as %for.inc14 and %entry.
  2. the same step but may in two different directions, such as %indvars.iv and %i.020 are both changed by 1, although the former increase and the latter decrease.

Any suggestions are welcome!

Regards,
Zuyu Zhang

bubbleSort.c (251 Bytes)

bubbleSort-cast-1.ll (2.4 KB)

bubbleSort-cast-2.ll (2.49 KB)

bubbleSort-gcc-O2.s (425 Bytes)

bubbleSort-mcore-cast-1.s (1.61 KB)

bubbleSort-mcore-cast-2.s (1.53 KB)

bubbleSort-mcore-O3.s (1.63 KB)

bubbleSort-O3.ll (2.3 KB)