Hi LLVMers,
I’m writing a LLVM backend for CCore, an ISA derived from Motorola MCore. I was wondering if someone wrote some IR level optimization passes for backend porting before ISel, such as an IR transformation from GEP to integer conversion/calculating instructions, and PHI combination. Here’s the bubble sorting example. The IR codes below are changed by hand and I try to write passes to do such modifications automatically.
C codes (bubbleSort.c):
void bubbleSort(int *arr, int n) {
int i, j;
int sz = n - 1;
for (i = 0; i < sz; i++)
for (j = 0; j < sz - i; j++)
if(*(arr+j) > *(arr+j+1)) {
int t = *(arr+j);
*(arr+j) = *(arr+j+1);
*(arr+j+1) = t;
}
}
Part of assemble codes (bubbleSort-gcc-O2.s) from GCC M*Core backend with -O2:
.L3:
cmplti r3,1
jbt .L6
mov r7,r2
movi r6,0
.L5:
ldw r5,(r7)
ldw r4,(r7,4)
…
.L4:
addi r6,1
addi r7,4
cmpne r3,r6
jbt .L5
The address calculating formula for such code is
N(0) = arr;
N(n) = N(n-1) + ElementSize, n>= 1
which r2 stands for arr, r7 stands for the next address, and ElementSize of int type is 4.
However, LLVM GEP adopts a rule as N(n) = arr + n * ElementSize, and may produce several instructions to compute the address.
Here’s IR codes (bubbleSort-O3.ll) generated by Clang -O3:
for.body4:
%j.018 = phi i32 [ 0, %for.body4.lr.ph ], [ %add.ptr.sum, %for.inc ]
%add.ptr.sum = add i32 %j.018, 1
%add.ptr6 = getelementptr inbounds i32* %arr, i32 %add.ptr.sum
%1 = load i32* %add.ptr6, align 4, !tbaa !0
and thus generated assembly codes (bubbleSort-mcore-O3.s) by llc -march=mcore as following:
mov r13,r6
lsli r13, 2
mov r14,r2
addu r14,r13
ldw r13, (r14, 4)
in which r6 stands for %j.018, r2 stands for arr, and lsli N means left logic shift N bits.
(It may be meaningless, but I don’t know a better way to get good codes as GCC)
If I modified IR(bubbleSort-cast-1.ll) by transforming GEP to integer instructions and calculating the address as below, and I could get codes(bubbleSort-mcore-cast-1.s) as same as GCC shown above.
%base = ptrtoint i32* %arr to i32
for.body4:
%cast = phi i32 [ %base, %for.body4.lr.ph ], [ %cast.next, %for.inc ]
%ptr = inttoptr i32 %cast to i32*
%add.ptr6 = getelementptr inbounds i32* %ptr, i32 1
%1 = load i32* %add.ptr6, align 4, !tbaa !0
for.inc:
%cast.next = add i32 %cast, 4
The idea for such GEP cast pass is
- check the fourth operand of GEP () whether a binary instruction such as add or sub (go 2)), or not (return).
- create the pointer-to-int instructions in the “preheader” block(%for.body4.lr.ph) and a step-by-step instruction in the next block (%for.inc).
- create a PHI instruction and a int-to-pointer instruction in the current block.
- replace the old GEP with the new GEP.
Next, I may reduce IR variables like %i.020, %sub2, %inc15 in bubbleSort-cast-2.ll and obtain better codes shown in bubbleSort-mcore-cast-2.s.
The yellow codes are the original IR, and the red are modified ones:
for.cond1.preheader: ; preds = %entry, %for.inc14
%indvars.iv = phi i32 [ %indvars.iv.next, %for.inc14 ], [ %sub, %entry ]
;%i.020 = phi i32 [ %inc15, %for.inc14 ], [ 0, %entry ]
;%sub2 = sub nsw i32 %sub, %i.020
;%cmp317 = icmp sgt i32 %sub2, 0
%cmp317 = icmp sgt i32 %indvars.iv, 0
br i1 %cmp317, label %for.body4.lr.ph, label %for.inc14
for.inc14: ; preds = %for.inc, %for.cond1.preheader
;%inc15 = add nsw i32 %i.020, 1
%indvars.iv.next = add i32 %indvars.iv, -1
;%exitcond21 = icmp eq i32 %inc15, %sub
%exitcond21 = icmp eq i32 %indvars.iv.next, 0
br i1 %exitcond21, label %for.end16, label %for.cond1.preheader
I think, the condition for applying such PHI optimization are
- the same entry points for several PHI instructions in one basic block, such as %for.inc14 and %entry.
- the same step but may in two different directions, such as %indvars.iv and %i.020 are both changed by 1, although the former increase and the latter decrease.
Any suggestions are welcome!
Regards,
Zuyu Zhang
bubbleSort.c (251 Bytes)
bubbleSort-cast-1.ll (2.4 KB)
bubbleSort-cast-2.ll (2.49 KB)
bubbleSort-gcc-O2.s (425 Bytes)
bubbleSort-mcore-cast-1.s (1.61 KB)
bubbleSort-mcore-cast-2.s (1.53 KB)
bubbleSort-mcore-O3.s (1.63 KB)
bubbleSort-O3.ll (2.3 KB)