Imprecise information by SCEV after loop unrolling and Vectorization.

Hi,

I have a simple test case,

// 1.c

#include <stdio.h>

int b[1000];

main() {

int a[1000];

int sum = 0;

for (int i = 0; i < 1000; i++) {

a[i] = i + 10;

}

for (int i = 0; i < 1000; i += 1) {

sum = sum + a[i];

}

printf(“sum = %ld\n”, sum);

}

Compilation: clang -m64 -fuse-ld=lld -flto -S -emit-llvm -o 1.ll 1.c

SCEV DUMP: opt -S -loop-accesses -mem2reg -analyze -indvars -loop-simplify -scalar-evolution 1.ll

Due to unrolling and Vectorization, Scev determines that some of the access are outside the range [ 0 – 999 ] --------------- [ 0 – 1020 ] as highlighted below in red, not sure if this issue is a known issue in Scev ?

Printing analysis ‘Loop Access Analysis’ for function ‘main’:

vector.body33:

Memory dependences are safe

Dependences:

Run-time memory checks:

Grouped accesses:

Store to invariant address was not found in loop.

SCEV assumptions:

Expressions re-written:

vector.body:

Report: loop control flow is not understood by analyzer

Dependences:

Run-time memory checks:

Grouped accesses:

Store to invariant address was not found in loop.

SCEV assumptions:

Expressions re-written:

Printing analysis ‘Promote Memory to Register’ for function ‘main’:

Pass::print not implemented for pass: ‘Promote Memory to Register’!

Printing analysis ‘Induction Variable Simplification’:

Pass::print not implemented for pass: ‘Induction Variable Simplification’!

Printing analysis ‘Induction Variable Simplification’:

Pass::print not implemented for pass: ‘Induction Variable Simplification’!

Printing analysis ‘Canonicalize natural loops’ for function ‘main’:

Pass::print not implemented for pass: ‘Canonicalize natural loops’!

Printing analysis ‘Scalar Evolution Analysis’ for function ‘main’:

Classifying expressions for: @main

%a = alloca [1000 x i32], align 16

→ %a U: [0,-15) S: [-9223372036854775808,9223372036854775793)

%0 = bitcast [1000 x i32]* %a to i8*

→ %a U: [0,-15) S: [-9223372036854775808,9223372036854775793)

%index = phi i64 [ 0, %entry ], [ %index.next.3, %vector.body.1 ]

→ {0,+,32}<%vector.body> U: [0,993) S: [0,993) Exits: 992 LoopDispositions: { %vector.body: Computable }

%1 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index

→ {%a,+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body: Computable }

%4 = bitcast i32* %1 to <4 x i32>*

→ {%a,+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body: Computable }

%5 = getelementptr inbounds i32, i32* %1, i64 4

→ {(16 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body: Computable }

%6 = bitcast i32* %5 to <4 x i32>*

→ {(16 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body: Computable }

%index.next = or i64 %index, 8

→ {8,+,32}<%vector.body> U: [8,1001) S: [8,1001) Exits: 1000 LoopDispositions: { %vector.body: Computable }

%index37 = phi i64 [ %index.next38.4, %vector.body33 ], [ 0, %vector.body33.preheader ]

→ {0,+,40}<%vector.body33> U: [0,961) S: [0,961) Exits: 960 LoopDispositions: { %vector.body33: Computable }

%8 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index37

→ {%a,+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3840 + %a) LoopDispositions: { %vector.body33: Computable }

%9 = bitcast i32* %8 to <4 x i32>*

→ {%a,+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3840 + %a) LoopDispositions: { %vector.body33: Computable }

%10 = getelementptr inbounds i32, i32* %8, i64 4

→ {(16 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3856 + %a) LoopDispositions: { %vector.body33: Computable }

%11 = bitcast i32* %10 to <4 x i32>*

→ {(16 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3856 + %a) LoopDispositions: { %vector.body33: Computable }

%index.next38 = add nuw nsw i64 %index37, 8

→ {8,+,40}<%vector.body33> U: [8,969) S: [8,969) Exits: 968 LoopDispositions: { %vector.body33: Computable }

%14 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38

→ {(32 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3872 + %a) LoopDispositions: { %vector.body33: Computable }

%15 = bitcast i32* %14 to <4 x i32>*

→ {(32 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3872 + %a) LoopDispositions: { %vector.body33: Computable }

%16 = getelementptr inbounds i32, i32* %14, i64 4

→ {(48 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3888 + %a) LoopDispositions: { %vector.body33: Computable }

%17 = bitcast i32* %16 to <4 x i32>*

→ {(48 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3888 + %a) LoopDispositions: { %vector.body33: Computable }

%index.next38.1 = add nuw nsw i64 %index37, 16

→ {16,+,40}<%vector.body33> U: [16,977) S: [16,977) Exits: 976 LoopDispositions: { %vector.body33: Computable }

%20 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38.1

→ {(64 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3904 + %a) LoopDispositions: { %vector.body33: Computable }

%21 = bitcast i32* %20 to <4 x i32>*

→ {(64 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3904 + %a) LoopDispositions: { %vector.body33: Computable }

%22 = getelementptr inbounds i32, i32* %20, i64 4

→ {(80 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3920 + %a) LoopDispositions: { %vector.body33: Computable }

%23 = bitcast i32* %22 to <4 x i32>*

→ {(80 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3920 + %a) LoopDispositions: { %vector.body33: Computable }

%index.next38.2 = add nuw nsw i64 %index37, 24

→ {24,+,40}<%vector.body33> U: [24,985) S: [24,985) Exits: 984 LoopDispositions: { %vector.body33: Computable }

%26 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38.2

→ {(96 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3936 + %a) LoopDispositions: { %vector.body33: Computable }

%27 = bitcast i32* %26 to <4 x i32>*

→ {(96 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3936 + %a) LoopDispositions: { %vector.body33: Computable }

%28 = getelementptr inbounds i32, i32* %26, i64 4

→ {(112 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3952 + %a) LoopDispositions: { %vector.body33: Computable }

%29 = bitcast i32* %28 to <4 x i32>*

→ {(112 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3952 + %a) LoopDispositions: { %vector.body33: Computable }

%index.next38.3 = add nuw nsw i64 %index37, 32

→ {32,+,40}<%vector.body33> U: [32,993) S: [32,993) Exits: 992 LoopDispositions: { %vector.body33: Computable }

%32 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38.3

→ {(128 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body33: Computable }

%33 = bitcast i32* %32 to <4 x i32>*

→ {(128 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body33: Computable }

%34 = getelementptr inbounds i32, i32* %32, i64 4

→ {(144 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body33: Computable }

%35 = bitcast i32* %34 to <4 x i32>*

→ {(144 + %a),+,160}<%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body33: Computable }

%index.next38.4 = add nuw nsw i64 %index37, 40

→ {40,+,40}<%vector.body33> U: [40,1001) S: [40,1001) Exits: 1000 LoopDispositions: { %vector.body33: Computable }

%39 = extractelement <4 x i32> %bin.rdx46, i32 0

→ %39 U: full-set S: full-set

%call = tail call i32 (i8*, …) @printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %39)

→ %call U: full-set S: full-set

%40 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next

→ {(32 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4000 + %a) LoopDispositions: { %vector.body: Computable }

%43 = bitcast i32* %40 to <4 x i32>*

→ {(32 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4000 + %a) LoopDispositions: { %vector.body: Computable }

%44 = getelementptr inbounds i32, i32* %40, i64 4

→ {(48 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4016 + %a) LoopDispositions: { %vector.body: Computable } ß-----------------------------------------OUTSIDE RANGE 4016/4 = 1004

%45 = bitcast i32* %44 to <4 x i32>*

→ {(48 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4016 + %a) LoopDispositions: { %vector.body: Computable }

%index.next.1 = or i64 %index, 16

→ {16,+,32}<%vector.body> U: [16,1009) S: [16,1009) Exits: 1008 LoopDispositions: { %vector.body: Computable }

%46 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next.1

→ {(64 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4032 + %a) LoopDispositions: { %vector.body: Computable } ß-----------------------------------------OUTSIDE RANGE 4032/4 = 1008

%49 = bitcast i32* %46 to <4 x i32>*

→ {(64 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4032 + %a) LoopDispositions: { %vector.body: Computable }

%50 = getelementptr inbounds i32, i32* %46, i64 4

→ {(80 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4048 + %a) LoopDispositions: { %vector.body: Computable }

%51 = bitcast i32* %50 to <4 x i32>*

→ {(80 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4048 + %a) LoopDispositions: { %vector.body: Computable }

%index.next.2 = or i64 %index, 24

→ {24,+,32}<%vector.body> U: [24,1017) S: [24,1017) Exits: 1016 LoopDispositions: { %vector.body: Computable }

%52 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next.2

→ {(96 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4064 + %a) LoopDispositions: { %vector.body: Computable }

%55 = bitcast i32* %52 to <4 x i32>*

→ {(96 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4064 + %a) LoopDispositions: { %vector.body: Computable }

%56 = getelementptr inbounds i32, i32* %52, i64 4

→ {(112 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4080 + %a) LoopDispositions: { %vector.body: Computable }

%57 = bitcast i32* %56 to <4 x i32>*

→ {(112 + %a),+,128}<%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4080 + %a) LoopDispositions: { %vector.body: Computable } ß-----------------------------------------OUTSIDE RANGE 4080/4 = 1020

%index.next.3 = add nuw nsw i64 %index, 32

→ {32,+,32}<%vector.body> U: [32,1025) S: [32,1025) Exits: 1024 LoopDispositions: { %vector.body: Computable }

Determining loop execution counts for: @main

Loop %vector.body33: backedge-taken count is 24

Loop %vector.body33: max backedge-taken count is 24

Loop %vector.body33: Predicated backedge-taken count is 24

Predicates:

Loop %vector.body33: Trip multiple is 25

Loop %vector.body: backedge-taken count is 31

Loop %vector.body: max backedge-taken count is 31

Loop %vector.body: Predicated backedge-taken count is 31

Predicates:

Loop %vector.body: Trip multiple is 32

$

Thanks

M Suresh