The more I fathom into interactions of poison and poison-related stuff, the more weird and scary their implications seem to me. Maybe I don’t understand something, if so, someone please help me to get through it.
Here is what LangRef says about noundef
property of loads:
The optional !noundef metadata must reference a single metadata name <empty_node>
corresponding to a node with no entries. The existence of !noundef metadata on the instruction tells
the optimizer that the value loaded is known to be well defined. If the value isn’t well defined, the
behavior is undefined.
I read this as, “if load with noundef metadata loads not-well-defined value, it is immediate UB”. Now, let’s look at this example:
int a[10];
for (int i = 0; i < N; i++) {
a[i] = 1;
}
for (int i = 1; i < N; i++) {
x = a[0];
a[i] += x;
}
We know that we will only enter 2nd loop if N > 1. Under this condition we are also certain that a
is initialized, and it doesn’t contain undef
. It means that x = a[0]
provably never reads undef
(actually it always reads 1
). So I can legally mark it as noundef
:
int a[10];
for (int i = 0; i < N; i++) {
a[i] = 1;
}
for (int i = 1; i < N; i++) {
x = load a[0] !noundef;
a[i] += x;
}
Now, what’s the point of reading the same value all over in loop? a
is known dereferenceable. We should be able to safely hoist the load out of the loop:
int a[10];
for (int i = 0; i < N; i++) {
a[i] = 1;
}
x = load a[0] !noundef;
for (int i = 1; i < N; i++) {
a[i] += x;
}
(Yeah I know that real LICM would hoist to preheader and not into a block before i < N
check, but let’s give imagination a chance and say that we have found a reason to hoist it right after init. We are discussing legality, not profitability).
Now, if N = 0, everything breaks. The original program entered neither loop, so it didn’t have any UB. The new program, according to specification, has immediate UB on load, because it will provably read an undef value from unitialized array.
Note that if there was no !noundef on load, such hoisting would be absolutely legit. The read value was undef but was never used.
You can say “OK, we should drop metadata !noundef” whenever we hoist a load. Fine, but what if x
was computed as x = call foo(arr, 0)
that reads arr[i] and foo
has noundef
attribute on return value? We cannot simply go to the module and drop the attribute of another function while performing LICM.
I thought that the intention of !noundef
was to give optimizer more freedom (e.g. drop freeze instuctions), not to break the optimizations. But in fact, presence of noundef means that speculating such instructions will introduce immediate UB to places where it used to be deferred UB, that could never lead to real UB.
Thoughts?
I vote for cleansing these attributes from loads and functions.