Hi,
I’ve looked through both AMDGPU and Sparc backends, and it seems they also do not perform the thing I want to make. The only backend which is doing it is AArch64, but it doesn’t have reg constraints.
So, just with an example. I have the following C code:
void test()
{
int a = 1; int b = 2; int c = 3; int d = 4;
a++; b++; c++; d++;
}
Without any frontend optimization is compiles to the following IR.
define void @test(i32* %z) #0 {
%1 = alloca i32*, align 4
%a = alloca i32, align 4
%b = alloca i32, align 4
%c = alloca i32, align 4
%d = alloca i32, align 4
store i32* %z, i32** %1, align 4
store i32 1, i32* %a, align 4
store i32 2, i32* %b, align 4
store i32 3, i32* %c, align 4
store i32 4, i32* %d, align 4
%2 = load i32, i32* %a, align 4
%3 = add nsw i32 %2, 1
store i32 %3, i32* %a, align 4
%4 = load i32, i32* %b, align 4
%5 = add nsw i32 %4, 1
store i32 %5, i32* %b, align 4
…
}
Which produces the following asm code.
mov r2, #1
str r2, [fp, #-2]
mov r3, #2
mov r2, #3
str r3, [fp, #-3]
str r2, [fp, #-4]
mov r3, #4
ldr r2, [fp, #-2]
str r3, [fp, #-5]
…
What I want to do is to merge neighboring stores and loads. For example
mov r3, #2
mov r2, #3
str r3, [fp, #-5]
str r2, [fp, #-4]
Can be converted to
mov r3, #2
mov r2, #3
strd r2, [fp, #-4]
But the main problem is that the offset for r3 in the snippet above was -3, not -5.
Currently, i’m doing the following. During the pre-RA i’m creating a REG_SEQUENCE with the target class, assigning vregs in question as its subregs, and create a load/store inst for the sequence with mem references merged.
It solves the register constraint problem, but the frame allocation problem still exists. Probably I’ll need to use fixed stack objects and manually pre-allocate the frame, which i really don’t want to do as it can break some other passes.
Petr