Interaction of noalias and dereferenceable

I think noalias and dereferenceable are interacting in some subtle ways, but I don’t know how exactly and the LangRef does not seem to discuss this.

Concretely: Is the following code UB?

(For ease of writing I will write this as C code but pretend I can just declare arguments noalias and/or dereferenceable. But this is discussing LLVM IR semantics and just using C syntax.)

void test(noalias int *ptr1, dereferenceable(4) int *ptr2) {
  *ptr1 = 0;
}

void main() {
  int x;
  test(&x, &x);
}

Clearly, if I remove dereferenceable this code is fine (while the two pointers do alias, they are not accessed in aliasing ways, and that is all that matters).

However, dereferenceable is supposed to allow LLVM to add spurious reads. And if we change test as follows, this clearly becomes UB:

void test(noalias int *ptr1, dereferenceable(4) int *ptr2) {
  *ptr1 = 0;
  int val = *ptr2;
}

So if the statement “dereferenceable allows LLVM to add spurious reads” is correct, then the original program must already have been UB?

One way this could make sense is that the presence of dereferenceable(N) is considered to act like a read of N bytes for the purpose of noalias. Then the original function would already be writing through ptr1 and reading from ptr2, which is UB. But I am not sure if that is truly the intention of dereferenceable?

But if that is not how dereferenceable and noalias interact, then how come dereferenceable-based optimizations like the following do not introduce UB?
Before:

void test(noalias int *ptr1, dereferenceable(4) int *ptr2, int count) {
  *ptr1 = 0;
  for (int i = 0; i < count; ++i) { int val = *ptr2; }
}

After:

void test(noalias int *ptr1, dereferenceable(4) int *ptr2, int count) {
  *ptr1 = 0;
  int val_prefetch = *ptr2; // aliases with the previous line!
  for (int i = 0; i < count; ++i) { int val = val_prefetch; }
}
1 Like

Where does this come from? Lang ref doesn’t say you can add spurious reads, or did I miss it?

The way I read noalias, and what would make sense (to me), is that this doesn’t introduce UB. noalias says it does not alias, or, we can pretend it does not alias. Which in your example is totally fine. Case 1, count<=0, we can move the val_prefetch = *ptr2 before the *ptr = 0 statement w/o any problems. It loads something different but that is irrelevant. Case 2, count > 0, the input program and the output are now both violating the noalias and we can for both pretend there is no aliasing.

That said, if we describe noalias as UB inducing, we probably face the problems you describe. And we might already use it as UB inducing in places, which might be bad.

It is coming from the observation that LLVM, in the presence of dereferenceable, will hoist loads out of loops without proving that the loop executes at least once. That is, to my knowledge, one of the main motivations for this attribute. And that is exactly a case of introducing a spurious read (in case the loop turns out to never execute).

Fair. Alone that is not a problem though, assuming the second part of my answer makes some sense.

Perhaps loads that violate noalias should be treated as returning poison rather than producing immediate UB?

1 Like

Returning poison on noalias violation for loads sounds sufficient, yes. (and UB for stores)
Since in theory the function won’t depend on the loaded value (dynamically), then it can be made poison and speculated all we want.

FWIW, Alive2 doesn’t model noalias ATM.

1 Like

Based on earlier Zulip discussion by @RalfJung, I came up with what seems to be an actual miscompilation:

It should return 3, but at -O3 it returns 0.

Note that this is very sensitive to the order of optimizations; even marking bar and baz as static makes the miscompilation go away on some compiler versions.

We start with foo:

static unsigned foo(int &__restrict a, int *__restrict b, unsigned count) {
    *b = 3;
    unsigned total = 0;
    for (unsigned i = 0; i < count; i++) {
        total += a;
    }
    return total;
}

Because a is dereferenceable (due to being a C++ reference), the loop is optimized into an unconditional load followed by a multiplication:

define noundef i32 @_Z3fooRiPij(i32* noalias nocapture noundef nonnull readonly align 4 dereferenceable(4) %0, i32* noalias nocapture noundef writeonly %1, i32 noundef %2) local_unnamed_add
r #0 {
  store i32 3, i32* %1, align 4, !tbaa !10
  %4 = load i32, i32* %0, align 4
  %5 = freeze i32 %4
  %6 = mul i32 %5, %2
  ret i32 %6
}

In the test case, foo is called with the two pointers equal, but count equal to 0. Thus there was no noalias-violating load before (because no load in foo is actually executed, and the outer functions don’t have noalias), but there is one now. Under the interpretation that violating noalias produces immediate UB, this optimization is buggy.

Under the interpretation where violating noalias produces poison, though, the optimization is sound. %4 is poison, but the optimizer added a freeze, making %5 an arbitrary defined value, which is then multiplied by 0. So far, so good.

Then foo is inlined into bar:

define noundef i32 @_Z3barPiS_j(i32* noundef %0, i32* noundef %1, i32 noundef %2) local_unnamed_addr #2 {
  call void @llvm.experimental.noalias.scope.decl(metadata !10)
  call void @llvm.experimental.noalias.scope.decl(metadata !13)
  store i32 3, i32* %1, align 4, !tbaa !15, !alias.scope !13, !noalias !10
  %4 = load i32, i32* %0, align 4, !alias.scope !10, !noalias !13
  %5 = freeze i32 %4
  %6 = mul i32 %5, %2
  %7 = load i32, i32* %0, align 4, !tbaa !15
  %8 = add i32 %6, %7
  ret i32 %8
}

And then the load of post from bar is merged with the one from foo, keeping the noalias metadata from the former:

define noundef i32 @_Z3barPiS_j(i32* nocapture noundef readonly %0, i32* nocapture noundef writeonly %1, i32 noundef %2) local_unnamed_addr #2 {
  call void @llvm.experimental.noalias.scope.decl(metadata !14)
  call void @llvm.experimental.noalias.scope.decl(metadata !17)
  store i32 3, i32* %1, align 4, !tbaa !10, !alias.scope !17, !noalias !14
  %4 = load i32, i32* %0, align 4, !alias.scope !14, !noalias !17
  %5 = freeze i32 %4
  %6 = mul i32 %5, %2
  %7 = add i32 %6, %4
  ret i32 %7
}

Under the interpretation where violating noalias produces poison, this must be the buggy optimization. The poison load %4 is now used directly in the computation of %7, bypassing the freeze (and affecting the result of the computation in a nontrivial way, not that that’s required for UB).

The rest is just exploiting the UB. After bar is inlined into baz, the load of pre at the beginning of baz is merged with the one from bar, effectively making post contain the value before the mutation rather than the one after it.

So which interpretation is correct, and how can this be fixed?

1 Like

I think this is always wrong, no matter the rest; and potentially the entire problem here.
Effectively, we just pulled a load into the restrict scope expressed via the noalias scope decl intrinsic even though it didn’t start there. Merging the loads requires dropping the alias scope.
@dobbelaj Is this solved by the new restrict patches?

I think it would be a valid optimization under the assumption that noalias violations are immediate UB. There are no stores between the two loads, so there would be no well-defined execution where they produce different results. The difference only arises if one is poison and the other is not.

(Even if there happened to be a data race – if we assume data races produce poison rather than immediate UB – it would necessarily make both loads return poison, not only one or the other.)

When merging load’s or store’s, the resulting !noalias metadata must be the intersection of the different !noalias metadata. For !alias.scope, it should be the union.

Yes, In the version of 2022/6/29 (D69542), the example works correct when full-restrict is enabled (emits ‘3’). It fails when it is disabled (emits ‘0’).

A bug should not be fixed by a huge patch that introduces new features. That’s not a solution.

This stuff needs to go through careful review, otherwise stuff like what’s being reported here happens. Or worse, you get miscompilations in the wild.
The review of noalias stuff has been suboptimal to say the least.

We already have a combineMetadataForCSE() function that would take care of this. However, it is not actually used by passes that do “plain” CSE such as EarlyCSE or GVN. I believe the reason is that historically, violations for metadata have always been considered immediate UB, so just retaining the metadata of the first instruction was always correct.

We should be able to fix this by adding the missing combineMetadataForCSE() calls, together with a LangRef change to specify that noalias violation results in a poison value rather than immediate UB.

The LangRef doesn’t really explicitly say, but I don’t know what else it could be.

That does not work. Consider:

void test(noalias int *ptr1, int *ptr2) {
  int _val = *ptr1;
  if (function()) { *ptr2 = 10; }
}

Assume this is called as test(&x, &x) as before. Now whether _val is poison depends on whether function() will return true in the future.

We can say “violating reads return poison”, sure (in fact we were forced to do that in the concurrency-aware version of our proposal for a Rust aliasing model), but we still have violating writes to deal with and as long as those are immediate UB, the interaction of dereferenceable and noalias remains a problem (solved e.g. by saying that dereferenceable acts like a read at function entry whose value is then ignored). Immediate UB is the only consistent semantics I know for noalias violations on writes.

(And yeah there might be a similar problem with data races, where introducing a spurious read can cause writes elsewhere to be considered racy. AFAIK the official stance is that the LLVM model allows such writes but considers them to clobber the written-to memory with poison. That would still make introducing a spurious read a wrong optimization.)
(With data races, the situation is slightly different-- only write-write conflicts cannot be resolved by saying that the read returns poison. Any read-write conflict has the read and the write happen “at the same time” so we can say the read returns poison. At least that is the hope, to my knowledge this has not been very well-explored for its interactions with weak C11-style concurrency. But anyway, for noalias a read-write conflicts can have the read predate the write by a lot, and there can even be external events in the mean time that influence whether the write, and therefore the conflict, even happens. Therefore I don’t think one can just say that on each such conflict the read returns poison.)

Specifically, consider the case of adding a spurious load at the top of a function:

void test(noalias int *ptr1, dereferenceable(4) int *ptr2, bool b) {
  if (b) { *ptr1 = 42; }
  // ...
}

becomes

void test(noalias int *ptr1, dereferenceable(4) int *ptr2, bool b) {
  int val = *ptr2; // add spurious load
  if (b) { *ptr1 = 42; }
  // ...
}

Clearly this transformation should be allowed. But if b is true then the second program has a noalias conflict at the write to ptr1 – the load from ptr2 is fine and I don’t think there is a way for this load to return poison, since at this point in the execution one cannot even know yet whether this load will be part of a load-store noalias conflict.
Therefore the 2nd program has UB (if b is true), and therefore for the transformation to be correct the original program must have already had UB (if b is true).


EDIT: There also is a fundamental problem, I think, with the idea of making conflicting loads return poison instead of causing insta-UB. Consider something like

void test(noalias int *ptr1, noalias int *ptr2) {
  *ptr1 = 42;
  int val = *ptr2;
}

Let’s say the two pointers alias. Then val will be poison but that’s fine. Now let’s recorder the two operations:

void test(noalias int *ptr1, noalias int *ptr2) {
  int val = *ptr2;
  *ptr1 = 42;
}

Now again the load cannot really return poison since the conflicting write has not happened yet. Therefore we only get a conflict at the store, and that causes UB. Therefore the reordering turned a UB-free program into one with UB.

I just saw this comment by @efriedma-quic that seems related to my old question here

I think there’s a natural connection to dereferenceable here: “dereferenceable(N)” loads N bytes on entry, “deferenceable(N) writable” loads and stores N bytes on entry. So gluing them together makes sense to me.

If operationally, dereferenceable(N) implies a load of N bytes on function entry, then my original example has UB, right? (Using C-like syntax just as a short-hand here, this is representing LLVM IR.)

void test(noalias int *ptr1, dereferenceable(4) int *ptr2) {
  // int _val = *ptr2; // with this line the code obviously has UB for violating `noalias`
  *ptr1 = 0;
}

void main() {
  int x;
  test(&x, &x);
}

test would now do a load through ptr2 (due to the attribute) and a write through ptr1 (that’s explicit) and the two overlap and that violates the noalias requirements.

Is that the intended semantics of dereferenceable?

I don’t think it is, or at least we should not make it to be.
If we interpret it like this, we make code UB that was not a minute ago, and I doubt we have a good reason to do so.

I would read the entire “loads bytes at the entry” rather as “could load bytes at the entry w/o causing a problem due to the load, unrelated from other effects”, not that it “conceptually happens”.

Well, the issue is that to me at least it looks like the spurious load (if introduced e.g. by LICM) could cause a problem, since it still clearly conflicts with the noalias… it’s just not possible to say “the load will return poison if later operations on a noalias pointer don’t write to this memory”, since there could be some non-deterministic choice in between now and then such that you cannot even say whether the later write will happen or not. This approach only works for data races where the read and the write are not ordered.

IOW:

Code without UB?

void test(noalias int *ptr1, dereferenceable(4) int *ptr2, int n) {
  for (int i = 0; i < n; ++i) { int _val = *ptr2; }
  if (n == 0) { *ptr1 = 0; }
}

void main() {
  int x;
  test(&x, &x, 0);
}

But now with UB?

void test(noalias int *ptr1, dereferenceable(4) int *ptr2, int n) {
  int _val = *ptr2;
  for (int i = 0; i < n; ++i) { }
  if (n == 0) { *ptr1 = 0; }
}

This looks to me like the old problem we had violating other attributes (align, etc.). It used to be direct UB, that does not work well with optimizations, so we made it produce poison instead. noalias seems to be similar, albeit more complex. Loads from that object using different pointers should be poison, not UB.

That’s what I am saying is not possible, because the load can happen way before the store. Consider an example like this:

void test(noalias int *ptr1, dereferenceable(4) int *ptr2) {
  int b = freeze(poison);
  if (b) {
    *ptr1 = 0;
  } else {
    int x = *ptr2;
    printf("%d\n", x);
  }
}

At least without dereferenceable, this function is obviously fine to call as test(&x, &x), since it will use either ptr1 or ptr2, but never both, so there is no aliasing violation.

But now if the compiler exploits dereferenceable and hoists the load…

void test(noalias int *ptr1, dereferenceable(4) int *ptr2) {
  int x = *ptr2;
  int b = freeze(poison);
  if (b) {
    *ptr1 = 0;
  } else {
    printf("%d\n", x);
  }
}

What should the value of x after the first line be?

  • Either this is a regular successful load. But then if b ends up being 1 we get a conflicting store via ptr1, and hence the program has UB.
  • Or this load returns undef due to a potential future aliasing conflict. But then if b ends up being 0 we print an undef and that is UB.

So it seems like we have UB either way. My conclusion is that either the original program already had UB (e.g. because the dereferenceable attribute implies a read at the top of the function) or hoisting the load out of the conditional was wrong.

That would be unfortunate. The way we lower C/C++ right now, the original program should not be UB.
That said, not hoisting is also far from ideal. I was hoping to have the poison appear retroactively, but you are saying that’s not a thing, or at least we don’t want such behavior?
FWIW, I was hoping that: x was only poison if *ptr1 is written. If x, or any value dependent on it that would propagate the poison, has been used in a way that would introduce UB from poison before *ptr1 is written, UB is implied by the write. If not, we continue with the values now all being poison.