Question regarding LICM

Hello,

I am working on a C++ expression templates based DSL where we are using
LLVM for the code generation. I needed some help in understanding the
behaviour of the LICM pass. In the following example code the "A" class
is a custom container that defines various arithmetic operators using
expression templates. We are defining three arrays of the "A" container
and aggregating the result of the multiplication into "lat".

I was attempting to get the expressions "a[i]" and "b[j]" to be hoisted
on top of the "j-loop" and the "k-loop" respectively.

//=== C++ code snippet ===//

1:A<int> a[4] = {A<int>(&ctx),A<int>(&ctx),A<int>(&ctx),A<int>(&ctx)};
2:A<int> b[4] = {A<int>(&ctx),A<int>(&ctx),A<int>(&ctx),A<int>(&ctx)};
3:A<int> c[4] = {A<int>(&ctx),A<int>(&ctx),A<int>(&ctx),A<int>(&ctx)};
5:A<int> lat(&ctx);
6:
7:for(std::size_t i = 0; i < 4; ++i)
8: for(std::size_t j = 0; j < 4; ++j)
9: for(std::size_t k = 0; k < 4; ++k) {
10: lat = a[i] * b[j] *c[k];
11: }

The IR generated for the body of the innermost loop after inlining
most of the expression template calls and loop simplification is show
below.

If I run LICM on this IR the GEPs in line 1,2 are hoisted into
the preheaders of the "j-loop" and the "k-loop" respectively. I believe
this is so as the operands to the GEP are loop invariant and
*isSafeToExecuteUnconditionally* returns trivially true for the GEP.

However, the CallInst Line 4,6 remain inside the innermost loop as the
*hasLoopInvariantOperands* for the CallInsts returns false as the GEP
operands themselves are not loop invariant.

This is the behaviour I was not sure about and would greatly appreciate
some help in understanding it. And, for LICM to hoist the CallInsts out
how should the code be structured.

//=== Generated IR for innermost loop body ===//

1: %22 = getelementptr inbounds [4 x %"struct.mdarray_terminal"], [4 x %"struct.mdarray_terminal"]* %a, i64 0, i64 %i.0
2: %23 = getelementptr inbounds [4 x %"struct.mdarray_terminal"], [4 x %"struct.mdarray_terminal"]* %b, i64 0, i64 %j.0
3: %24 = getelementptr inbounds [4 x %"struct.mdarray_terminal"], [4 x %"struct.mdarray_terminal"]* %c, i64 0, i64 %k.0
4: %25 = call i32* @access_fn(%"struct.mdarray_terminal"* %22, i64 0, i64 0)
5: %26 = load i32, i32* %25, !alias.scope !1, !noalias !3
6: %27 = call i32* @access_fn(%"struct.mdarray_terminal"* %23, i64 0, i64 0)
7: %28 = load i32, i32* %27, !alias.scope !5, !noalias !7
8: %mkernel = call i32 @mult_op(i32 %26, i32 %28)
9: %29 = call i32* @access_fn(%"struct.mdarray_terminal"* %24, i64 0, i64 0)
10: %30 = load i32, i32* %29, !alias.scope !6, !noalias !8
11: %mkernel2 = call i32 @mult_op(i32 %mkernel, i32 %30)
12: %31 = call i32* @access_fn(%"struct.mdarray_terminal"* %lat, i64 0, i64 0)
13: store i32 %mkernel2, i32* %31, !alias.scope !4, !noalias !9
14: %32 = add i64 %k.0, 1
15: br label %19

Best,
Dipto

Hi,

How are the access_fn declared in the IR?
Some attributes are needed in order for LICM to be able to operate (see llvm/test/Transforms/LICM/argmemonly-call.ll )