Hello all,
I wanted to get some feedback on this patch for ScalarEvolution.
It addresses a performance problem I am seeing for simple benchmark.
Starting with this C code:
01: signed char foo(void)
02: {
03: const int count = 8000;
04: signed char result = 0;
05: int j;
06:
07: for (j = 0; j < count; ++j) {
08: result += (result_t)(3);
09: }
10:
11: return result;
12: }
I end up with this IR feeding into Induction Variable Simplification:
01: define signext i8 @foo() nounwind readnone {
02: entry:
03: br label %for.body
04:
05: for.body: ; preds = %entry, %for.body
06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ]
08: %conv2 = and i32 %result.03, 255
09: %add = add nsw i32 %conv2, 3
10: %inc = add nsw i32 %j.04, 1
11: %cmp = icmp slt i32 %inc, 8000
12: br i1 %cmp, label %for.body, label %for.end
13:
14: for.end: ; preds = %for.body
15: %conv1 = trunc i32 %add to i8
16: ret i8 %conv1
17: }
Unfortunately, the 'and' on line 8 prevents Scalar Evolution from
being able to create an expression for '%add' that it knows how to
evaluate.
The patch detects this pattern in createNodeForPHI and creates an
equivalent expression that can be evaluated.
Note that SCEV translates the 'and' into
ZeroExtend(Truncate(%result.03, i8), i32)
So in terms of SCEV expressions, we essentially have
%add[n] = Add(ZeroExtend(Truncate(%add[n-1], i8), i32), 3)
(BTW, I'm no scholar here, just a layman, so my terminology is
probably all wrong).
The patch basically moves the ZeroExtend and Truncate outside of the
recurrence, though it must take into account that for iteration n, the
ZeroExtend and Truncate apply to the value at iteration n-1.
For a constant step this is accomplished by subtracting one step from
the recurrence, performing the Truncate and ZeroExtend, and then
adding the step to the result. In other words:
%add[n] = Add(
ZeroExtend(
Truncate(
Minus(AddRec(Start=0,Step=3)[n], 3),
i8),
i32),
3)
It's a little more complicated when the Step is another recurrence,
but essentially the same.
Thoughts?
Matthew C.
0001-Teach-ScalarEvolution-to-handle-IV-add-zext-trunc-IV.patch (10.9 KB)