LLVM IR after mem2reg optimisation

Hi,

I would like to know how to obtain LLVM IR such that the temporaries to which the value to a variable is loaded are directly used later wherever there are its uses instead of loading into new registers for each uses?

Consider the following C program

#include <stdio.h>
int main()
{
    int a,b, c, d;
    scanf("%d", &a);
    scanf("%d", &b);
    c = (a + b); 
    d = (a + b);
    return 0;
}

I tried mem2reg pass for the same. The LLVM IR after trying mem2reg optimisation is given below;

bb:
  %i1 = alloca i32, align 4
  %i2 = alloca i32, align 4
  %i5 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %i5) #3
  %i6 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %i6) #3
  %i9 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %i1)
  %i10 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %i2)
  %i11 = load i32, i32* %i1, align 4, !tbaa !3
  %i12 = load i32, i32* %i2, align 4, !tbaa !3
  %i13 = add nsw i32 %i11, %i12
  %i14 = load i32, i32* %i1, align 4, !tbaa !3
  %i15 = load i32, i32* %i2, align 4, !tbaa !3
  %i16 = add nsw i32 %i14, %i15
  %i19 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %i19) #3
  %i20 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %i20) #3
  ret i32 0

If we see in the IR the same variable “a” and “b” are used at two places but for each use there are different loads to temporaries. How can i generate IR such that %i11 and %i12 are used in the second “add” instruction too ?

First of all, you should use sroa instead of mem2reg, as it supersedes mem2reg.

Then, getting rid of the second pair of loads in this case is simple (maybe even instcombine can get rid of those), but in general you’d be looking at GVN, for example.
You can also run opt -O2/-O3 with -print-after-all to print the IR after every optimization in the pipeline so you can learn what each optimization does.

InstCombine can indeed eliminate those duplicate loads (though if you don’t use c and d themselves it’ll eliminate everything other than the scanf’s).

Thanks @nlopes @jrtc27 for your reply. I tried sroa first and then instcombine.
The LLVM IR after sroa optimisation is given below:

bb:
  %i1 = alloca i32, align 4
  %i2 = alloca i32, align 4
  %i5 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %i5) #3
  %i6 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %i6) #3
  %i9 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %i1)
  %i10 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %i2)
  %i11 = load i32, i32* %i1, align 4, !tbaa !3
  %i12 = load i32, i32* %i2, align 4, !tbaa !3
  %i13 = add nsw i32 %i11, %i12
  %i14 = load i32, i32* %i1, align 4, !tbaa !3
  %i15 = load i32, i32* %i2, align 4, !tbaa !3
  %i16 = add nsw i32 %i14, %i15
  %i19 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %i19) #3
  %i20 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %i20) #3
  ret i32 0
}

Then on the above IR i did instcombine and the output IR is given below:

bb:
  %i1 = alloca i32, align 4
  %i2 = alloca i32, align 4
  %i5 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %i5) #3
  %i6 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %i6) #3
  %i9 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* nonnull %i1) #3
  %i10 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* nonnull %i2) #3
  %i19 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %i19) #3
  %i20 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %i20) #3
  ret i32 0
}

From the LLVM IR i could see some optimisation is done for sure but the “add” instructions are no longer present in the resultant IR. Why did that happen? Am i doing anything wrong in the sequence of commands. Please see the commands given below.

PS:
The sequence of instructions i tried are given below:

clang -S -emit-llvm -O -Xclang -disable-llvm-passes LinaerProgram3.c -o LinaerProgram3.ll
opt -instnamer LinaerProgram3.ll > LProgram3Namer.bc
llvm-dis LProgram3Namer.bc -o LProgram3Namer.ll
opt -sroa LProgram3Namer.ll > LProgram3Namersroa.bc
llvm-dis LProgram3Namersroa.bc -o LProgram3Namersroa.ll
opt -instcombine LProgram3Namersroa.ll > LProgram3NamerInstComb.bc
llvm-dis LProgram3NamerInstComb.bc -o LProgram3NamerInstComb.ll

The result of the add instructions is not used, thus they can be deleted.

1 Like

Thank you @nlopes