Look at this silly function:
$ cat small.c
unsigned f(unsigned a, unsigned *p) {
unsigned x = a/4;
p[0] = x;
p[1] = x+x;
return p[1] - 2*p[0];
}
GCC turns this into straightforward code and figures out the 0 return value:
shrl $2, %edi
movl %edi, (%rsi)
addl %edi, %edi
movl %edi, 4(%rsi)
movl $0, %eax
ret
LLVM optimizes the code:
$ clang -O -S -o- small.c -emit-llvm
define i32 @f(i32 %a, i32* nocapture %p) nounwind uwtable ssp {
entry:
%div = lshr i32 %a, 2
store i32 %div, i32* %p, align 4, !tbaa !0
%0 = lshr i32 %a, 1
%add = and i32 %0, 2147483646
%arrayidx1 = getelementptr inbounds i32* %p, i64 1
store i32 %add, i32* %arrayidx1, align 4, !tbaa !0
%1 = lshr i32 %a, 1
%mul = and i32 %1, 2147483646
%sub = sub i32 %add, %mul
ret i32 %sub
}
InstCombine has obfuscated 'x+x' so badly that it can't even recognize it itself. DAGCombine will eventually untangle the mess and figure out that the return value is 0, but that is too late. The loop optimization passes and scalar evolution don't get to see the simple 'x' and '2*x' expressions.
The problem is these transformations in InstCombineShifts.cpp:
// If we have ((X >>? C) << C), turn this into X & (-1 << C).
// (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2)
// (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2)
The shl instruction is just as much arithmetic as it is a bitwise logical instruction. It is used as the canonical form for x+x, 4*x etc. The transforms above turn an arithmetic expression into a purely logical expression, and it is very hard to recover the original arithmetic expression.
Disabling the transformations produces the straightforward:
define i32 @f(i32 %a, i32* nocapture %p) nounwind uwtable ssp {
entry:
%div = lshr i32 %a, 2
store i32 %div, i32* %p, align 4, !tbaa !0
%add = shl nuw nsw i32 %div, 1
%arrayidx1 = getelementptr inbounds i32* %p, i64 1
store i32 %add, i32* %arrayidx1, align 4, !tbaa !0
ret i32 0
}
The problem is not so much figuring out the 0 return value. The problem is that the 'canonical form' produced by InstCombine is hiding trivial arithmetic expressions like 'x+x' from scalar evolution.
I am not sure how best to fix this. If possible, InstCombine's canonicalization shouldn't hide arithmetic progressions behind bit masks. At least, it seems these transformations should be disabled unless (X >> C).hasOneUse(). They aren't exactly optimizations.
This:
%div = lshr i32 %a, 2
store i32 %div, i32* %p, align 4, !tbaa !0
%add = shl nuw nsw i32 %div, 1
is better than this:
%div = lshr i32 %a, 2
store i32 %div, i32* %p, align 4, !tbaa !0
%0 = lshr i32 %a, 1
%add = and i32 %0, 2147483646
/jakob