LLVM IR after mem2reg optimisation

royreshma · July 8, 2022, 8:37am

Hi,

I would like to know how to obtain LLVM IR such that the temporaries to which the value to a variable is loaded are directly used later wherever there are its uses instead of loading into new registers for each uses?

Consider the following C program

#include <stdio.h>
int main()
{
    int a,b, c, d;
    scanf("%d", &a);
    scanf("%d", &b);
    c = (a + b); 
    d = (a + b);
    return 0;
}

I tried mem2reg pass for the same. The LLVM IR after trying mem2reg optimisation is given below;

bb:
  %i1 = alloca i32, align 4
  %i2 = alloca i32, align 4
  %i5 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %i5) #3
  %i6 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %i6) #3
  %i9 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %i1)
  %i10 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %i2)
  %i11 = load i32, i32* %i1, align 4, !tbaa !3
  %i12 = load i32, i32* %i2, align 4, !tbaa !3
  %i13 = add nsw i32 %i11, %i12
  %i14 = load i32, i32* %i1, align 4, !tbaa !3
  %i15 = load i32, i32* %i2, align 4, !tbaa !3
  %i16 = add nsw i32 %i14, %i15
  %i19 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %i19) #3
  %i20 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %i20) #3
  ret i32 0

If we see in the IR the same variable “a” and “b” are used at two places but for each use there are different loads to temporaries. How can i generate IR such that %i11 and %i12 are used in the second “add” instruction too ?

nlopes · July 8, 2022, 8:48am

First of all, you should use sroa instead of mem2reg, as it supersedes mem2reg.

Then, getting rid of the second pair of loads in this case is simple (maybe even instcombine can get rid of those), but in general you’d be looking at GVN, for example.
You can also run opt -O2/-O3 with -print-after-all to print the IR after every optimization in the pipeline so you can learn what each optimization does.

jrtc27 · July 8, 2022, 6:37pm

InstCombine can indeed eliminate those duplicate loads (though if you don’t use c and d themselves it’ll eliminate everything other than the scanf’s).

royreshma · July 9, 2022, 8:03am

Thanks @nlopes @jrtc27 for your reply. I tried sroa first and then instcombine.
The LLVM IR after sroa optimisation is given below:

bb:
  %i1 = alloca i32, align 4
  %i2 = alloca i32, align 4
  %i5 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %i5) #3
  %i6 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %i6) #3
  %i9 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %i1)
  %i10 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %i2)
  %i11 = load i32, i32* %i1, align 4, !tbaa !3
  %i12 = load i32, i32* %i2, align 4, !tbaa !3
  %i13 = add nsw i32 %i11, %i12
  %i14 = load i32, i32* %i1, align 4, !tbaa !3
  %i15 = load i32, i32* %i2, align 4, !tbaa !3
  %i16 = add nsw i32 %i14, %i15
  %i19 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %i19) #3
  %i20 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %i20) #3
  ret i32 0
}

Then on the above IR i did instcombine and the output IR is given below:

bb:
  %i1 = alloca i32, align 4
  %i2 = alloca i32, align 4
  %i5 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %i5) #3
  %i6 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %i6) #3
  %i9 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* nonnull %i1) #3
  %i10 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* nonnull %i2) #3
  %i19 = bitcast i32* %i2 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %i19) #3
  %i20 = bitcast i32* %i1 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %i20) #3
  ret i32 0
}

From the LLVM IR i could see some optimisation is done for sure but the “add” instructions are no longer present in the resultant IR. Why did that happen? Am i doing anything wrong in the sequence of commands. Please see the commands given below.

PS:
The sequence of instructions i tried are given below:

clang -S -emit-llvm -O -Xclang -disable-llvm-passes LinaerProgram3.c -o LinaerProgram3.ll
opt -instnamer LinaerProgram3.ll > LProgram3Namer.bc
llvm-dis LProgram3Namer.bc -o LProgram3Namer.ll
opt -sroa LProgram3Namer.ll > LProgram3Namersroa.bc
llvm-dis LProgram3Namersroa.bc -o LProgram3Namersroa.ll
opt -instcombine LProgram3Namersroa.ll > LProgram3NamerInstComb.bc
llvm-dis LProgram3NamerInstComb.bc -o LProgram3NamerInstComb.ll

nlopes · July 9, 2022, 4:02pm

The result of the add instructions is not used, thus they can be deleted.

royreshma · July 10, 2022, 9:14am

Thank you @nlopes

Topic		Replies	Views
Problem while using mem2reg Optimization LLVM Dev List Archives	2	93	July 3, 2008
LLVM IR temporary variable reuse LLVM Dev List Archives	7	83	March 18, 2016
Llvm pass to remove temporaries LLVM Dev List Archives	2	85	August 29, 2015
LLVM for loop statement IR can be generated but there maybe some errors Beginners llvm	4	341	May 26, 2023
How to optimize out the duplicated memory load instructions? LLVM Dev List Archives	6	91	July 23, 2020

LLVM IR after mem2reg optimisation

Related topics