Potential issue with noalias @malloc and @realloc

Hi all,

I think I've spotted a semantic issue with marking @malloc and
@realloc as noalias. Say we have the program:

int f() {
int* p0 = malloc(size of(int));
free(p0);
int* p1 = malloc(sizeof(int));
if (!p1) return 20;
int value = 0;
for (int i = 0; i < 1; i++) {
*p1 = 20;
value = *p1;
if (false) // "false" is obscured in a way the compiler can't fathom
if (p0 == p1)
a();
else
b();
}
return result;
}

The program above is well defined, and will always return 20.
However, if we first unswitch loops:

int f() {
int* p0 = malloc(size of(int));
free(p0);
int* p1 = malloc(sizeof(int));
if (!p1) return 20;
int value = 0;
if (p0 == p1) {
for (int i = 0; i < 1; i++) {
*p1 = 20;
value = *p1;
if (false)
a();
}
} else {
// Other copy of the loop that calls b() in dead code
}
return result;
}

and then run GVN::propgateEquality that replaces one use of p1 with p0
but not the other (for some reason, say the other use was obscured
behind a function call):

int f() {
int* p0 = malloc(size of(int));
free(p0);
int* p1 = malloc(sizeof(int));
if (!p1) return 20;
int value = 0;
if (p0 == p1) {
for (int i = 0; i < 1; i++) {
*p0 = 20; // S0
value = *p1; // L0
if (false)
a();
}
} else {
// Other copy of the loop that calls b() in dead code
}
return result;
}

Now we have a problem -- since p0 NoAlias p1 (distinct @malloc()
calls), we can reorder S0 and L0 (or LICM L0):

int f() {
int* p0 = malloc(size of(int));
free(p0);
int* p1 = malloc(sizeof(int));
if (!p1) return 20;
int value = 0;
if (p0 == p1) {
for (int i = 0; i < 1; i++) {
value = *p1; // L0
*p0 = 20; // S0
if (false)
a();
}
} else {
// Other copy of the loop that calls b() in dead code
}
return result;
}

and we'll now return garbage if p0 == p1 (likely) and p1 is not null
(also likely).

I do not yet have a solution for this. Let me know what you think!

Thanks,
-- Sanjoy

I think you are spot-on!

From http://en.cppreference.com/w/c/memory/malloc,
A previous call to free or realloc that deallocates a region of memory synchronizes-with a call to mallocthat allocates the same or a part of the same region of memory. This synchronization occurs after any access to the memory by the deallocating function and before any access to the memory by malloc. There is a single total order of all allocation and deallocation functions operating on each particular region of memory.

So only “non-freed” malloc pointers are No-Alias which makes it flow-sensitive. There is no reason why malloc couldn’t return previously freed location.

Regards,
Kevin

Hi Kevin,

So only "non-freed" malloc pointers are No-Alias which makes it
flow-sensitive. There is no reason why malloc couldn't return previously
freed location.

Yes.

Talking to Nick Lewycky on IRC, I figured out a shorter way of saying
what I wanted to say. We know that programs like this are UB in C:

p0 = malloc();
free(p0);
p1 = malloc();
if (p0 == p1) {
int v = *p0; // Semantically free'ed but bitwise equal to an allocated value
}

and we relied on them having UB when marking malloc's return value as noalias.

However, we can end up in cases like the above by applying
loop-unswitch + GVN to well defined C programs.

-- Sanjoy

I don’t know when this was added on cppreference but

The behavior is undefined if after free() returns, an access is made through the pointer ptr (unless another allocation function happened to result in a pointer value equal to ptr)

This seems to suggest that there is no UB… However, I couldn’t find the corresponding line or relevant part on latest C std, http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf

Regards,
Kevin

I don't know when this was added on cppreference but

The behavior is undefined if after |free()| returns, an access is made

through the pointer |ptr| (unless another allocation function happened
to result in a pointer value equal to |ptr|)

This seems to suggest that there is no UB... However, I couldn't find
the corresponding line or relevant part on latest C
std, http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf

C99 says that p0 has indeterminate value after the free.

Jon

Sorry, forgot to source where I quoted: http://en.cppreference.com/w/c/memory/free p.5
Kevin

The corresponding line does not exist, and it is in fact, wrong :slight_smile:

C11 says nothing like that.
C++14 says:
“The lifetime of an object of type T begins when:
— storage with the proper alignment and size for type T is obtained, and — if the object has non-trivial initialization, its initialization is complete.
The lifetime of an object of type T ends when:
— if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
— the storage which the object occupies is reused or released.”

it also says:
“If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:

a bunch of conditions”.

Which makes it even worse because they become aliasing again.

Note: This is a generic problem with any situation where noalias exists but the pointers are proven equal :slight_smile:
TBAA, for example, has the same generic issue, we just drop the tbaa metadata and declare it okay, even though it would have been UB at the source level.

We have no way of specifying the lifetime has ended, either.
You can make the problem as hard or as simple as you want above, but you
can't solve it in general.
Even if you wanted to, it is in fact, trivial to come up with cases where
basicaa will say no alias for the load/store, but not say noalias at the
propagation point.

In fact, you can make it worse than this.
You can get our AA to say may-alias/must-alias for the memory during gvn,
and say noalias for the same pair of pointers by the time licm rolls around.

The only real solution, IMHO, is to this is to have real object
lifetime/ordering chains and build an SSA-like form for them like memoryssa.
The languages generate the lifetime orderings.

(Giving up on noalias for this case only fixes this case, it still affects
other languages, etc)

Hi Daniel,

The corresponding line does not exist, and it is in fact, wrong :slight_smile:

C11 says nothing like that.
C++14 says:
"The lifetime of an object of type T begins when:
— storage with the proper alignment and size for type T is obtained, and —
if the object has non-trivial initialization, its initialization is
complete.
The lifetime of an object of type T ends when:
— if T is a class type with a non-trivial destructor (12.4), the destructor
call starts, or
— the storage which the object occupies is reused or released."

it also says:
"If, after the lifetime of an object has ended and before the storage which
the object occupied is reused or released, a new object is created at the
storage location which the original object occupied, a pointer that pointed
to the original object, a reference that referred to the original object,
or the name of the original object will automatically refer to the new
object and, once the lifetime of the new object has started, can be used to
manipulate the new object, if:
...
a bunch of conditions".

Quickly scanning the spec, it looks like it tries to differentiate
between the "lifetime" of an object (starts on construction, ends on
destruction) and the time during which the storage in which the object
resides has been acquired and not been released.

In particular, it says "If, after the lifetime of an object has ended
and **before** the storage which the object occupied is reused or
released", so I suspect that wording is there to allow:

X* val = new X;
val->~X();
new(val) X;
val->foo(); // valid use

Also, basic.stc.dynamic.deallocation says:

If the argument given to a deallocation function in the standard
library is a pointer that is not the null pointer value (4.10), the
deallocation function shall deallocate the storage referenced by the
pointer, rendering invalid all pointers referring to any part of the
deallocated storage. Indirection through an invalid pointer value and
passing an invalid pointer value to a deallocation function have
undefined behavior. *Any other use of an invalid pointer value has
implementation-defined behavior.*

-- Sanjoy

Hi Daniel,

Note: This is a generic problem with any situation where noalias exists but
the pointers are proven equal :slight_smile:

Yes.

TBAA, for example, has the same generic issue, we just drop the tbaa
metadata and declare it okay, even though it would have been UB at the
source level.

Yes. I noticed this when talking to Kostya on a thread about TBAA.
However, I wanted to reduce the "dependencies" of this behavior so I
did not mention TBAA here.

> Hi all,
>
> I think I've spotted a semantic issue with marking @malloc and
> @realloc as noalias. Say we have the program:
>
> int f() {
> int* p0 = malloc(sizeof(int));
> free(p0);
>

> int* p1 = malloc(sizeof(int));
> if (!p1) return 20;
> int value = 0;
> for (int i = 0; i < 1; i++) {
> *p1 = 20;
> value = *p1;
> if (false) // "false" is obscured in a way the compiler can't fathom
> if (p0 == p1)
> a();
> else
> b();
> }
> return result;
> }
>
> The program above is well defined, and will always return 20.
>

no it won't, you never initialized result :slight_smile: :slight_smile: :slight_smile:

Yes!

> However, if we first unswitch loops:
>
> int f() {
> int* p0 = malloc(size of(int));
> free(p0);
> int* p1 = malloc(sizeof(int));
> if (!p1) return 20;
> int value = 0;
> if (p0 == p1) {
> for (int i = 0; i < 1; i++) {
> *p1 = 20;
> value = *p1;
> if (false)
> a();
> }
> } else {
> // Other copy of the loop that calls b() in dead code
> }
> return result;
> }
>
> and then run GVN::propgateEquality that replaces one use of p1 with p0
> but not the other (for some reason, say the other use was obscured
> behind a function call):
>
> int f() {
> int* p0 = malloc(size of(int));
> free(p0);
> int* p1 = malloc(sizeof(int));
> if (!p1) return 20;
> int value = 0;
> if (p0 == p1) {
> for (int i = 0; i < 1; i++) {
> *p0 = 20; // S0
> value = *p1; // L0
> if (false)
> a();
> }
> } else {
> // Other copy of the loop that calls b() in dead code
> }
> return result;
> }
>
>
> Now we have a problem -- since p0 NoAlias p1 (distinct @malloc()
> calls), we can reorder S0 and L0 (or LICM L0):
>
>
> int f() {
> int* p0 = malloc(size of(int));
> free(p0);
> int* p1 = malloc(sizeof(int));
> if (!p1) return 20;
> int value = 0;
> if (p0 == p1) {
> for (int i = 0; i < 1; i++) {
> value = *p1; // L0
> *p0 = 20; // S0
> if (false)
> a();
> }
> } else {
> // Other copy of the loop that calls b() in dead code
> }
> return result;
> }
>
> and we'll now return garbage if p0 == p1 (likely) and p1 is not null
> (also likely).
>

The free changes p0.

Otherwise, yes, your fundamental problem is that you are playing games with
two different language level objects whose pointers are equal, but it says
they do not alias.
I don't remember who i tried to explain to on irc that this can happen, but
they didn't believe me :slight_smile:

We have no way of specifying the lifetime has ended, either.
You can make the problem as hard or as simple as you want above, but you
can't solve it in general.
Even if you wanted to, it is in fact, trivial to come up with cases where
basicaa will say no alias for the load/store, but not say noalias at the
propagation point.

Just put enough copies in between that it hits the various depth or
complexity limits and gives up and says mayalias :slight_smile:

As I was telling David Majnemer on IRC, I'm not annoyed by the fact
that there is a bug, but by the fact that there is no good way of
fixing the bug that I can think of.

It would require semantic changes to llvm ir to fix this to properly
express object lifetimes that is compatible with all the random babble
standards have written down :slight_smile:
For now, the only sane solution IMHO, is to say that no alias implies
pointer inequality, regardless of the standards. Because the above can
occur in any situation noalias exists but they are allowed to be pointer
equal (as mentioned, it's trivial to make this happen in TBAA).

Just to be clear, you're suggesting that we no longer mark malloc's
return value as noalias?

I say sane because you can't actually do anything else and get it right all
the time, other than disable noalias in pretty much every case.

FWIW outside of not propagating when tbaa sets differ, gcc does the same as
we do .
It definitely believes p0 and p1 do not alias in the example above, and
would have no trouble propagating them.
I can't quite get it to optimize it. but it does set the equality p0 == p1,
and staring at the pass, it just doesn't bother to simplify the expression,
it would, in fact, otherwise propagate.

I'll give writing an actual C++ program that demonstrates the problem
this weekend, but unfortunately these things tend to be time
consuming.

-- Sanjoy

For now, the only sane solution IMHO, is to say that no alias implies
pointer inequality, regardless of the standards. Because the above can
occur in any situation noalias exists but they are allowed to be pointer
equal (as mentioned, it’s trivial to make this happen in TBAA).

Just to be clear, you’re suggesting that we no longer mark malloc’s
return value as noalias?

CMIIW, I believe the suggestion was treat p0 as unequal to p1 even under runtime check if p0 has NoAlias relation with p1. This will at least make ie. GVN to treat as if p0 and p1 aren’t equal even if bits are, preventing transformation of p1 to p0, thus preventing reordering. I’m not a huge fan of this solution, and is quite hacky to GVN and pretty much to all the transformation passes :(.
The root cause is quite clear; you cannot assume malloc’d ptrs to have No-Alias without flow-sensitive, “lifetime” analysis.

Kevin

> It would require semantic changes to llvm ir to fix this to properly
> express object lifetimes that is compatible with all the random babble
> standards have written down :slight_smile:
> For now, the only sane solution IMHO, is to say that no alias implies
> pointer inequality, regardless of the standards. Because the above can
> occur in any situation noalias exists but they are allowed to be pointer
> equal (as mentioned, it's trivial to make this happen in TBAA).

Just to be clear, you're suggesting that we no longer mark malloc's
return value as noalias?

Actually, i'd suggest just letting it be a bug :slight_smile:
That's what other compilers seem to do.
You could hack around this in clang by using the !alias.scope and !noalias
form, and not attaching the scope past the lifetime end.

But we don't optimize that.

Apparently i was really unclear:
Making GVN do this would lose a tremendous amount of optimization.
You'd be better off turning off noalias than doing this.

I would actually just accept that this sucks and move on for now :stuck_out_tongue:
Or modify clang to using the scoped noalias and solve the perofrmance bugs
we have with that representation.

Long term, we need well defined and real lifetime start/end information
that is semantically relevant, not just optional metadata.

(Or we need standards to stop making the lifetimes and aliasing issues
ridiculously complex :P)

(by this i mean it's only used by memdep).

I believe hal also has a noalias instrinsic proposal that would have been
useful here.

But thinking harder, it has the same problem because there is no way to end
the lifetime of something in ssa, and thus, iif do
a1 = llvm.noalias(a)
b1 = llvm.noalias(b)
if (a1 == b1)
  oh crap

So we can't use it to specify the lifetimes here.

Yep; doesn’t help. I feel like this is the perfect place for a pithy quote on self deception. The core problem here is that we allow the optimizer to rely on a lie (that all memory from malloc, and other noalias functions, is distinct). This, of course, is not really true. The consequence seems to be that if you write code that can observe the fallacy of this assumption - even dead code - then the model breaks. Dead code can observe things, in this sense, I suppose, if there are no data dependencies or side effects that prevent the optimizer from making it speculatively live. -Hal

The corresponding line does not exist, and it is in fact, wrong :slight_smile:

C11 says nothing like that.
C++14 says:
"The lifetime of an object of type T begins when:
— storage with the proper alignment and size for type T is obtained, and —
if the object has non-trivial initialization, its initialization is
complete.
The lifetime of an object of type T ends when:
— if T is a class type with a non-trivial destructor (12.4), the
destructor call starts, or
— the storage which the object occupies is reused or released."

it also says:
"If, after the lifetime of an object has ended and before the storage
which the object occupied is reused or released, a new object is created at
the storage location which the original object occupied, a pointer that
pointed to the original object, a reference that referred to the original
object, or the name of the original object will automatically refer to the
new object and, once the lifetime of the new object has started, can be
used to manipulate the new object, if:

This wording describes what happens if you, for instance, placement new
over an existing object. It is not intended to cover the case where you
happen to reallocate storage you previously freed; see below for the rule
on that case.

...
a bunch of conditions".

Which makes it even worse because they become aliasing again.

I believe this is the portion of the C++ standard you're looking for:
[basic.stc]/4:

"When the end of the duration of a region of storage is reached, the values
of all pointers representing the address of any part of that region of
storage become invalid pointer values (6.9.2). Indirection through an
invalid pointer value and passing an invalid pointer value to a
deallocation function have undefined behavior. Any other use of an invalid
pointer value has implementation-defined behavior."

(In C++14 and before, this was [basic.dynamic.deallocation]p4 and only
applied to the effects of invoking 'operator delete', but we added the
above wording as a defect resolution so it's intended to apply
retroactively.)

So the comparison in the source program has UB, and the loop unswitching
transformation is therefore invalid as an operation on C++ programs. (I
don't think this observation helps, though, since we -- presumably -- want
the transformation to be valid as an operation on LLVM IR...)

It seems to me that there are two ways of thinking about this: either the
value of a pointer in IR is richer than its bit sequence, in which case
replacing p1 with p0 in a block predicated by p0 == p1 is an incorrect
transformation if you cannot prove that one pointer was based on the other,
or the value of a pointer in IR is exactly its bit sequence, in which case
the code performing the transformation incorrectly updated the IR and a
correct transformation would need to somehow remove the noalias from the
malloc calls. The C++ object model formally takes the former standpoint;
its pointers notionally point to objects, which are abstract entities
occupying storage, rather than pointing to the storage itself.

It seems to me that there are two ways of thinking about this: either the
value of a pointer in IR is richer than its bit sequence, in which case
replacing p1 with p0 in a block predicated by p0 == p1 is an incorrect
transformation if you cannot prove that one pointer was based on the other,

Which would be a non-starter just from the cost of doing so, not to mention
the allowable optimization you lose :slight_smile:

or the value of a pointer in IR is exactly its bit sequence, in which case
the code performing the transformation incorrectly updated the IR and a
correct transformation would need to somehow remove the noalias from the
malloc calls.

Sure, but noalias is just a symptom here. You can make the same thing occur
in other ways. It's fundamentally an issue of being able to express when
the abstract identity of a pointer has changed even when the bit-value has
not.

IE there are enough transformation updates that need to occur that we
probably need to do something different than try to band-aid/patch all the
places that will have this issue.

The C++ object model formally takes the former standpoint; its pointers

notionally point to objects, which are abstract entities occupying storage,
rather than pointing to the storage itself.

Which, i get why they do (in fact i would do the same), but saying the
abstract objects have an identity outside of the bit values of the pointers
means that the IR's need to be able to represent identity changes
separately from value changes (this is what i meant by "add support for
describing lifetimes that has semantic meaning").
I'm not aware of any compiler that does this effectively, and it's a fairly
large semantic change.

They all pretty much hack it and hope they don't break too much shit.

Separately, changing noalias would just band-aid it. You can make the same
thing occur with TBAA, placement new, or really, any way we have where the
abstract identity may change but llvm doesn't express it.

I think richard put it better.
The core of the lie is that if two pointers have the same bit value, they
are as good as each other, and must have the same abstract identity
This would in fact, be true, if we could *end* the identity of a pointer
every time a language-level "abstract identity" change occurred.
The problem is this lie is pretty core to the identity of llvm right now.

There is no way to end the lifetime of a scalar value in SSA (and doing so
would break pretty much every nice property it has).

So you can't say:
"lifetime.ended(foo)
<it's not valid to use foo anymore>"

We can use something like memoryssa (let's call it or orderedmemoryssa,
let's call it), mark the lifetime/identity changes as "ordering" defs, and
get that the two pointers are not noalias because they are not the same
abstract identity as the one marked noalias.

RIght now to hack up noalias we could use the noalias scoping information.
But it's not going to help the other cases we have that do the same thing.

The store and the load were never noalias in the first place, becuase they
were *always* talking about different abstract identities than the one that
the malloc calls had. It's just not expressed in the IR, and it was not
used to optimize until the other transformations occurred.

That is, with a smart enough optimization code, I could right now,
directly go from the start code to the end code with the IR as it appeared
in the first example.
We just don't happen to.

Given good enough optimizers, there is no transformation update point to
fix, there is only the missing semantic information that the abstract
identity of the memory being stored and loaded is not the same as the
abstract identity of the pointer returned by malloc.