Hi,
We found that today's 17.30%/11.37% performance regressions in LNT SingleSource/Benchmarks/Shootout/sieve on LNT-AArch64-A53-O3__clang_DEV__aarch64 and LNT-Thumb2v7-A15-O3__clang_DEV__thumbv7 (http://llvm.org/perf/db_default/v4/nts/daily_report/2017/1/20?filter-machine-regex=aarch64|arm|thumb|green) are caused by changes [rL292492] in InstCombine:
⚙ D28406 [InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1 "[InstCombine] icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1"
The Loop Vectorizer generates code with more instructions:
==== Loop Vectorizer from rL292492 ====
for.body5: ; preds = %for.inc16.for.body5_crit_edge, %for.cond.preheader
%indvar = phi i64 [ %indvar.next, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]
%1 = phi i8 [ %.pre, %for.inc16.for.body5_crit_edge ], [ 1, %for.cond.preheader ]
%count.122 = phi i32 [ %count.2, %for.inc16.for.body5_crit_edge ], [ 0, %for.cond.preheader ]
%i.119 = phi i64 [ %inc17, %for.inc16.for.body5_crit_edge ], [ 2, %for.cond.preheader ]
%2 = add i64 %indvar, 2
%3 = shl i64 %indvar, 1
%4 = add i64 %3, 4
%5 = add i64 %indvar, 2
%6 = shl i64 %indvar, 1
%7 = add i64 %6, 4
%8 = add i64 %indvar, 2
%9 = mul i64 %indvar, 3
%10 = add i64 %9, 6
%11 = icmp sgt i64 %10, 8193
%smax = select i1 %11, i64 %10, i64 8193
%12 = mul i64 %indvar, -2
%13 = add i64 %12, -5
%14 = add i64 %smax, %13
%15 = add i64 %indvar, 2
%16 = udiv i64 %14, %15
%17 = add i64 %16, 1
%tobool7 = icmp eq i8 %1, 0
br i1 %tobool7, label %for.inc16, label %if.then