Interaction of noalias and dereferenceable

Based on earlier Zulip discussion by @RalfJung, I came up with what seems to be an actual miscompilation:

It should return 3, but at -O3 it returns 0.

Note that this is very sensitive to the order of optimizations; even marking bar and baz as static makes the miscompilation go away on some compiler versions.

We start with foo:

static unsigned foo(int &__restrict a, int *__restrict b, unsigned count) {
    *b = 3;
    unsigned total = 0;
    for (unsigned i = 0; i < count; i++) {
        total += a;
    }
    return total;
}

Because a is dereferenceable (due to being a C++ reference), the loop is optimized into an unconditional load followed by a multiplication:

define noundef i32 @_Z3fooRiPij(i32* noalias nocapture noundef nonnull readonly align 4 dereferenceable(4) %0, i32* noalias nocapture noundef writeonly %1, i32 noundef %2) local_unnamed_add
r #0 {
  store i32 3, i32* %1, align 4, !tbaa !10
  %4 = load i32, i32* %0, align 4
  %5 = freeze i32 %4
  %6 = mul i32 %5, %2
  ret i32 %6
}

In the test case, foo is called with the two pointers equal, but count equal to 0. Thus there was no noalias-violating load before (because no load in foo is actually executed, and the outer functions don’t have noalias), but there is one now. Under the interpretation that violating noalias produces immediate UB, this optimization is buggy.

Under the interpretation where violating noalias produces poison, though, the optimization is sound. %4 is poison, but the optimizer added a freeze, making %5 an arbitrary defined value, which is then multiplied by 0. So far, so good.

Then foo is inlined into bar:

define noundef i32 @_Z3barPiS_j(i32* noundef %0, i32* noundef %1, i32 noundef %2) local_unnamed_addr #2 {
  call void @llvm.experimental.noalias.scope.decl(metadata !10)
  call void @llvm.experimental.noalias.scope.decl(metadata !13)
  store i32 3, i32* %1, align 4, !tbaa !15, !alias.scope !13, !noalias !10
  %4 = load i32, i32* %0, align 4, !alias.scope !10, !noalias !13
  %5 = freeze i32 %4
  %6 = mul i32 %5, %2
  %7 = load i32, i32* %0, align 4, !tbaa !15
  %8 = add i32 %6, %7
  ret i32 %8
}

And then the load of post from bar is merged with the one from foo, keeping the noalias metadata from the former:

define noundef i32 @_Z3barPiS_j(i32* nocapture noundef readonly %0, i32* nocapture noundef writeonly %1, i32 noundef %2) local_unnamed_addr #2 {
  call void @llvm.experimental.noalias.scope.decl(metadata !14)
  call void @llvm.experimental.noalias.scope.decl(metadata !17)
  store i32 3, i32* %1, align 4, !tbaa !10, !alias.scope !17, !noalias !14
  %4 = load i32, i32* %0, align 4, !alias.scope !14, !noalias !17
  %5 = freeze i32 %4
  %6 = mul i32 %5, %2
  %7 = add i32 %6, %4
  ret i32 %7
}

Under the interpretation where violating noalias produces poison, this must be the buggy optimization. The poison load %4 is now used directly in the computation of %7, bypassing the freeze (and affecting the result of the computation in a nontrivial way, not that that’s required for UB).

The rest is just exploiting the UB. After bar is inlined into baz, the load of pre at the beginning of baz is merged with the one from bar, effectively making post contain the value before the mutation rather than the one after it.

So which interpretation is correct, and how can this be fixed?

1 Like