[AA] Question on the full restrict patch

Hi Jeroen,

In working on a scheme to represent Fortran alias information precisely,
we encounter a situation in which two alias sets partially overlap. We would
like to seek any idea of representing such alias sets, particular, in the scheme of
the full restrict alias approach in D68484.

Consider the following Fortran code,

module m
implicit none
!set A: variables with the TARGET attribute
integer, allocatable, target :: tgtA01, tgtA02, tgtA03
integer, pointer :: p_1_SetA, p_2_SetA, p_3_SetA

!set B: variables without the TARGET attribute
integer :: tgtB01, tgtB02, tgtB03

contains
subroutine init_ptr()
p_1_SetA = 0 ; p_2_SetA = 0 ; p_3_SetA = 0
end subroutine init_ptr

subroutine compute(n)
integer :: idx, n

call init_var(tgtB01)
call init_var(tgtB02)
call init_var(tgtB03)

associate(assoc_tgtA01 => tgtA01, assoc_tgtA02 => tgtA02, assoc_tgtB02 => tgtB02)
do idx = 1, n
assoc_tgtA01 = assoc_tgtA01 * tgtB03 + assoc_tgtA02 * tgtA03 + tgtB01 * assoc_tgtB02
end do
end associate
end subroutine compute
end module

The alias analysis in frontend can generate the following alias sets for
“tgtA01” and “tgtA02”.

alias-sets-from-fe.txt (5.68 KB)

after-opt-without-alias-from-fe.ll (5.6 KB)

before-opt-without-alias-from-fe.ll (13.8 KB)

after-opt-with-alias-from-fe.ll (11.6 KB)

before-opt-with-alias-from-fe.ll (25.4 KB)

Hi Kelvin, Tarique,

I finally found some time to look into your example in more depth.

As Johannes mentioned in one of the LLVM Alias Analysis Technical calls, mapping a ‘generic table’ that

tracks and provides alias information for any two pointers would not be an efficient way of keeping track of the aliasing

in llvm-ir.

Reconstructing the (minimal) necessary scopes and intrinsics for a such a ‘generic table’ is also not an easy problem

(and not fast to solve as well).

I have been wondering on how to tackle the example that you provided. Reconstructing the optimal or a sensible ‘noalias’ set

by hand did not seem easy. So I went for a different approach: I created a c++ program that should represent as close as possible

what you are describing in fortran (and the resulting llvm-ir code). There is space for improvement here, Fortran seems to be able

to provide extra alias information. But it should be a good start to show how a (hierarchical based) mapping can be done.

------- test03.cpp

// COMPILE: …/bin/clang++ -c test03.cpp -S -O3 -ffull-restrict -emit-llvm -mllvm -unroll-max-count=0 -mllvm --interleave-loops=false

// Roughly equivalent c++ structure:

#define restrict __restrict

namespace m {

struct N {

int* restrict tgtA01;

int* restrict tgtA02;

int* restrict tgtA03;

int* p_1_SetA;

int* p_2_SetA;

int* p_3_SetA;

// invisible to others ! But not easy to represent in c++, so there is room for improvement here

int tgtB01;

int tgtB02;

int tgtB03;

} N;

extern void init_var(int&);

void init_ptr() {

N.p_1_SetA = nullptr;

N.p_2_SetA = nullptr;

N.p_3_SetA = nullptr;

}

void compute(int & __restrict n) { // the ‘n’ parameters seems to be passed as a ‘i32* noalias’

int idx;

init_var(N.tgtB01);

init_var(N.tgtB02);

init_var(N.tgtB03);

{

auto &assoc_tgtA01 = *N.tgtA01;

auto &assoc_tgtA02 = *N.tgtA02;

auto &assoc_tgtB02 = N.tgtB02;

for (idx = 1; idx <= n; ++idx) {

assoc_tgtA01 = assoc_tgtA01 * N.tgtB03 + assoc_tgtA02 * (*N.tgtA03) + N.tgtB01 * assoc_tgtB02;

}

}

}

}

Hi Jeroen,

Thanks for the analysis.

We think your C++ code can represent the Fortran code for the most part. One crucial difference is that there should not be a restrict qualifier for tgtA0[123]. In Fortran, a variable with the target attribute can be pointed to by any pointers with the same type. Hence, if we comment out the restrict qualifier for tgtA0[123], the load/store instructions stay in the loop body.

We also slightly modify the original code as well as the corresponding C++ code to further illustrate the issue. We add two statements, one before the loop and one after. The statements are marked as “NEW: stmt-[12]” in the attached t1.cpp

We observe that, with the restrict qualifier on tgtA0[123], there is no reload of N.p_1_setA after the loop (stmt-2). (In Fortran, it is possible that p_1_setA is modified in the loop.)

//-----------------------------------------
%2 = load i32*, i32** getelementptr inbounds (%“struct.m::N”, %“struct.m::N”* @_ZN1m1NE, i64 0, i32 3), align 8, !tbaa !2, !noalias !14
%3 = load i32, i32* %2, align 4, !tbaa !8, !noalias !14
%add = add nsw i32 %3, 20

for.end: ; preds = %for.cond.for.end_crit_edge, %entry
%add5 = add nsw i32 %3, 50
store i32 %add5, i32* %2, align 4, !tbaa !8, !noalias !14
ret void
//-----------------------------------------

If the example is compiled with -DNO_RESTRICT_QUALIFIER, we see that N.p_1_SetA is reloaded after the loop.

//-----------------------------------------
%2 = load i32*, i32** getelementptr inbounds (%“struct.m::N”, %“struct.m::N”* @_ZN1m1NE, i64 0, i32 3), align 8, !tbaa !2, !noalias !11
%3 = load i32, i32* %2, align 4, !tbaa !8, !noalias !11
%add = add nsw i32 %3, 20
store i32 %add, i32* %2, align 4, !tbaa !8, !noalias !11

for.end.loopexit: ; preds = %for.body
%.pre14 = load i32, i32* %2, align 4, !tbaa !8, !noalias !11
br label %for.end

for.end: ; preds = %for.end.loopexit, %entry
%12 = phi i32 [ %.pre14, %for.end.loopexit ], [ %add, %entry ]
%add5 = add nsw i32 %12, 30
store i32 %add5, i32* %2, align 4, !tbaa !8, !noalias !11
//-----------------------------------------

We also attach the Fortran code (t1.f and t1-f.ll) for your reference.

Thanks,
Kelvin

t1.cpp (1.17 KB)

t1.f (829 Bytes)

t1-f.ll (21.3 KB)