Based on earlier Zulip discussion by @RalfJung, I came up with what seems to be an actual miscompilation:
It should return 3, but at -O3
it returns 0.
Note that this is very sensitive to the order of optimizations; even marking bar
and baz
as static
makes the miscompilation go away on some compiler versions.
We start with foo
:
static unsigned foo(int &__restrict a, int *__restrict b, unsigned count) {
*b = 3;
unsigned total = 0;
for (unsigned i = 0; i < count; i++) {
total += a;
}
return total;
}
Because a
is dereferenceable
(due to being a C++ reference), the loop is optimized into an unconditional load followed by a multiplication:
define noundef i32 @_Z3fooRiPij(i32* noalias nocapture noundef nonnull readonly align 4 dereferenceable(4) %0, i32* noalias nocapture noundef writeonly %1, i32 noundef %2) local_unnamed_add
r #0 {
store i32 3, i32* %1, align 4, !tbaa !10
%4 = load i32, i32* %0, align 4
%5 = freeze i32 %4
%6 = mul i32 %5, %2
ret i32 %6
}
In the test case, foo
is called with the two pointers equal, but count
equal to 0. Thus there was no noalias
-violating load before (because no load in foo
is actually executed, and the outer functions don’t have noalias
), but there is one now. Under the interpretation that violating noalias
produces immediate UB, this optimization is buggy.
Under the interpretation where violating noalias
produces poison
, though, the optimization is sound. %4
is poison
, but the optimizer added a freeze
, making %5
an arbitrary defined value, which is then multiplied by 0. So far, so good.
Then foo
is inlined into bar
:
define noundef i32 @_Z3barPiS_j(i32* noundef %0, i32* noundef %1, i32 noundef %2) local_unnamed_addr #2 {
call void @llvm.experimental.noalias.scope.decl(metadata !10)
call void @llvm.experimental.noalias.scope.decl(metadata !13)
store i32 3, i32* %1, align 4, !tbaa !15, !alias.scope !13, !noalias !10
%4 = load i32, i32* %0, align 4, !alias.scope !10, !noalias !13
%5 = freeze i32 %4
%6 = mul i32 %5, %2
%7 = load i32, i32* %0, align 4, !tbaa !15
%8 = add i32 %6, %7
ret i32 %8
}
And then the load of post
from bar
is merged with the one from foo
, keeping the noalias metadata from the former:
define noundef i32 @_Z3barPiS_j(i32* nocapture noundef readonly %0, i32* nocapture noundef writeonly %1, i32 noundef %2) local_unnamed_addr #2 {
call void @llvm.experimental.noalias.scope.decl(metadata !14)
call void @llvm.experimental.noalias.scope.decl(metadata !17)
store i32 3, i32* %1, align 4, !tbaa !10, !alias.scope !17, !noalias !14
%4 = load i32, i32* %0, align 4, !alias.scope !14, !noalias !17
%5 = freeze i32 %4
%6 = mul i32 %5, %2
%7 = add i32 %6, %4
ret i32 %7
}
Under the interpretation where violating noalias
produces poison
, this must be the buggy optimization. The poison load %4
is now used directly in the computation of %7
, bypassing the freeze
(and affecting the result of the computation in a nontrivial way, not that that’s required for UB).
The rest is just exploiting the UB. After bar
is inlined into baz
, the load of pre
at the beginning of baz
is merged with the one from bar
, effectively making post
contain the value before the mutation rather than the one after it.
So which interpretation is correct, and how can this be fixed?