[+CC Andy]

Can anyone familiar with ScalarRevolution tell me whether this is an

expected behavior or a bug?

Assuming you're talking about 2*k, this is a bug. ScalarEvolution

should be able to prove that {0,+,4} is <nsw> and <nuw>.

Part of the problem is that in this case ScalarEvolution does not try

to prove that {0,+,4} is <nsw> when the expression is constructed

(since proving that has non-trivial cost) [1]. To get ScalarEvolution

to try to prove that {0,+,4} has no-wrap properties, the client needs

to construct a sign-extend expression of {0,+,4}. SCEV will try to

change a sext of an add-rec to an add-rec of sexts and try to prove

no-wrap in the process [2].

To easily do this from IR, you can just add a dummy sext instruction

(like in the IR fragment below) and run

'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV

won't dce the unused instruction):

; ModuleID = '<stdin>'

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

target triple = "x86_64-apple-macosx10.10.0"

@x = common global [1024 x float] zeroinitializer, align 16

@y = common global [1024 x float] zeroinitializer, align 16

; Function Attrs: nounwind ssp uwtable

define void @myloop1() {

bb:

br label %bb2

bb1: ; preds = %bb2

ret void

bb2: ; preds = %bb2, %bb

%k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ]

%tmp = shl nsw i64 %k.01, 1

%dummy_sext = sext i64 %tmp to i128

%tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x,

i64 0, i64 %tmp

%tmp4 = load float, float* %tmp3, align 16, !tbaa !2

%tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y,

i64 0, i64 %k.01

%tmp6 = load float, float* %tmp5, align 8, !tbaa !2

%tmp7 = fadd float %tmp4, %tmp6

store float %tmp7, float* %tmp3, align 16, !tbaa !2

%tmp8 = or i64 %k.01, 1

%tmp9 = shl nsw i64 %tmp8, 1

%tmp10 = getelementptr inbounds [1024 x float], [1024 x float]*

@x, i64 0, i64 %tmp9

%tmp11 = load float, float* %tmp10, align 8, !tbaa !2

%tmp12 = getelementptr inbounds [1024 x float], [1024 x float]*

@y, i64 0, i64 %tmp8

%tmp13 = load float, float* %tmp12, align 4, !tbaa !2

%tmp14 = fadd float %tmp11, %tmp13

store float %tmp14, float* %tmp10, align 8, !tbaa !2

%tmp15 = add nsw i64 %k.01, 2

%exitcond.1 = icmp eq i64 %tmp15, 512

br i1 %exitcond.1, label %bb1, label %bb2

}

!0 = !{i32 1, !"PIC Level", i32 2}

!1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build 239114)"}

!2 = !{!3, !3, i64 0}

!3 = !{!"float", !4, i64 0}

!4 = !{!"omnipotent char", !5, i64 0}

!5 = !{!"Simple C/C++ TBAA"}

However, adding a dummy sext only proves <nuw> for {0,+,4} and not

<nsw>. The problem is that when constructing sext({0,+,4}) SCEV

realizes that since {0,+,4} is always positive, sext({0,+,4}) ==

zext({0,+,4}); and to change a zext of an add-rec to an add-rec of

zexts, SCEV only needs to prove <nuw> and not <nsw>.

I wonder if it makes sense to add a hook to SCEV that gets it to try

as hard as it can to prove <nuw> / <nsw> for specific add recurrences.

[1]: It will try to prove nuw and nsw in cases where it is "easy", but

not in this specific case.

[2]: So a worthwhile project is to have the vectorizer construct sign

extend expressions of add recurrences that it really cares about

proving no-wrap of and then check the flags on the

SCEVAddRecExpr. It may consume too much compile time, so there's

a tricky trade-off here.

-- Sanjoy