Placement new and TBAA

I’ve heard from a few people that placement new is going to be handled by LLVM using the llvm.invariant.group.barrier intrinsic; however, the documentation for that intrinsic seems to indicate that it deals with only invariant.group metadata and not TBAA metadata.

I would like to understand how the solution would work in the context of https://llvm.org/bugs/show_bug.cgi?format=multiple&id=28767.

– HT

What is the purpose of the union there?
I ask because pretty much no compiler will respecting the unioning without visible accesses in all cases, because it would ruin most optimization[1]

But i’m also not sure it’s required in this testcase to make your testcase fail.

In practice, handling placement new properly in gcc required the equivalent of a new intrinsic (in gcc, it required adding CHANGE_DYNAMIC_TYPE_EXPR).

[1] For example, you can pretty much get every compiler out there to generate “wrong” code using unions and strict aliasing, see the last discussion we had about this on the mailing list (subject line had “wrong results with union and strict-aliasing”).

What is the purpose of the union there?

The purpose of the union is to increase portability by ensuring that the
placement new is not being performed on insufficiently sized or aligned
memory.

I ask because pretty much no compiler will respecting the unioning without
visible accesses in all cases, because it would ruin most optimization[1]

But i'm also not sure it's required in this testcase to make your testcase
fail.

It isn't. The program should be valid, without the union, on platforms
where int and float have the same size an alignment.

In practice, handling placement new properly in gcc required the
equivalent of a new intrinsic (in gcc, it required adding
CHANGE_DYNAMIC_TYPE_EXPR).

Sure; my question is whether or not there is already a solution in the
works for LLVM. If not, then I'll try to work with some folks to propose an
intrinsic.

-- HT

What is the purpose of the union there?

The purpose of the union is to increase portability by ensuring that the
placement new is not being performed on insufficiently sized or aligned
memory.

Gotcha

I ask because pretty much no compiler will respecting the unioning
without visible accesses in all cases, because it would ruin most
optimization[1]

But i'm also not sure it's required in this testcase to make your
testcase fail.

It isn't. The program should be valid, without the union, on platforms
where int and float have the same size an alignment.

Right, so you need to debug that first, and see what's going wrong.
Without TBAA info in the .ll file, this should just work.

In practice, handling placement new properly in gcc required the
equivalent of a new intrinsic (in gcc, it required adding
CHANGE_DYNAMIC_TYPE_EXPR).

Sure; my question is whether or not there is already a solution in the
works for LLVM. If not, then I'll try to work with some folks to propose an
intrinsic.

I would first focus on understanding why the testcase fails without any
TBAA info at all.
In that case, i would expect it to work.

The PR says "passes with -fno-strict-aliasing”, my understanding is that it is failing only with the TBAA indeed.

You don’t need the main and the union to reproduce, extracting foo() alone in its single own file is enough:

void *operator new(decltype(sizeof 0), void *) noexcept;
float *qq;
void foo(int *p, int *q, long unk) {
for (long i = 0; i < unk; ++i) {
++*p;
qq = new (static_cast<void *>(&q[i])) float(42);
}
}

LICM will get the store to p out of the loop, conceptually turning it into:

void foo(int *p, int *q, long unk) {
for (long i = 0; i < unk; ++i) {
qq = new (static_cast<void *>(&q[i])) float(42);
}
++*p;
}

Now I don’t know if the use of placement new in this example is legal in the first place. I thought calling delete before using placement new was mandatory.

CC Sanjoy, since he looked into TBAA recently and it reminds me a similar test case he mentioned, not with placement new but with a call to a function taking int * and float *, and passing the same address (call site was union).

Yes, I misread, because I didn’t think -O1 turned on strict aliasing and missed that part

Hi,

The PR says "passes with -fno-strict-aliasing”, my understanding is that it
is failing only with the TBAA indeed.

You don’t need the main and the union to reproduce, extracting foo() alone
in its single own file is enough:

void *operator new(decltype(sizeof 0), void *) noexcept;
float *qq;
void foo(int *p, int *q, long unk) {
   for (long i = 0; i < unk; ++i) {
      ++*p;
      qq = new (static_cast<void *>(&q[i])) float(42);
   }
}

LICM will get the store to p out of the loop, conceptually turning it into:

void foo(int *p, int *q, long unk) {
   for (long i = 0; i < unk; ++i) {
      qq = new (static_cast<void *>(&q[i])) float(42);
   }
   ++*p;
}

Now I don’t know if the use of placement new in this example is legal in the
first place. I thought calling delete before using placement new was
mandatory.

So if you:

1. override operator delete to do nothing for that type (so that the
    placement new actually has unfreed backing storage to re-use).
2. have an empty destructor.

and have the source program be

  void *operator new(decltype(sizeof 0), void *) noexcept;
  float *qq;
  void foo(int *p, int *q, long unk) {
     // unk = 1
     for (long i = 0; i < unk; ++i) {
        ++*p;
        delete &q[i]; // since i is only ever 0, this does not
              // delete a derived pointer
        qq = new (static_cast<void *>(&q[i])) float(42);
     }
  }

then I suspect we'll have the same problem after the destructor and
the operator delete for that type has been inlined away.

CC Sanjoy, since he looked into TBAA recently and it reminds me a similar
test case he mentioned, not with placement new but with a call to a function
taking int * and float *, and passing the same address (call site was
union).

The case I was talking about was something like:

// C11 source file.

union S {
  int i;
  float f;
};

void f(int* i, float *f) {
  *i = 20;
  *f = 40.0;
}

void g() {
  union S s;
  f(&s.i, &s.f);
}

At least cursorily this looked well defined in C++ to me -- f should
first write int(20) and then float(40.0) to the same location legally.
However clang emits tbaa such that LLVM concludes that in f, the store
to i and the store to f don't alias.

However, I'm not very confident in the "this looked well defined"
reading. I did not consult the standard, but instead relied on second
hand stackoverflow answers, so I won't be surprised to hear that I was
wrong about this. Actually I'm never surprised to hear that I'm
wrong, but I'll be especially less surprised in this case. :slight_smile:

Thanks!
-- Sanjoy

The position most compilers take is the the access must be visibly through a union if you want it to work. Otherwise, it’s not possible to use tbaa really at all. Move your function f to another file, for example, and you simply could never tell. Some will give you a little more, but you are pushing your luck

This was discussed in depth in the thread I quoted, and has GCC mailing list threads going back to before llvm existed :).
Note that there are rules about touching members in a union other than the first one you set, but pretty much everyone guarantees type punning through unions to work.

From: "Daniel Berlin via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Sanjoy Das" <sanjoy@playingwithpointers.com>, "Mehdi Amini" <mehdi.amini@apple.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Saturday, November 26, 2016 12:03:01 AM
Subject: Re: [llvm-dev] Placement new and TBAA

The position most compilers take is the the access must be visibly
through a union if you want it to work. Otherwise, it's not possible
to use tbaa really at all. Move your function f to another file, for
example, and you simply could never tell. Some will give you a
little more, but you are pushing your luck

This was discussed in depth in the thread I quoted, and has GCC
mailing list threads going back to before llvm existed :).
Note that there are rules about touching members in a union other
than the first one you set, but pretty much everyone guarantees type
punning through unions to work.

I think that the wording in 3.10 reflects this. It says that you need to access the data through a glvalue of the union type in this case.

-Hal

There's wording describing the reuse of storage and the situation of
neither calling the destructor nor using a *delete-expression*.

N4604 subclause 3.8 [basic.life] paragraph 5:

For an object of a class type with a non-trivial destructor, the program is
not required to call the destructor explicitly before the storage which the
object occupies is reused or released; however, if there is no explicit
call to the destructor or if a *delete-expression* is not used to release
the storage, the destructor shall not be implicitly called and any program
that depends on the side effects produced by the destructor has undefined
behavior.

I read the above wording as saying that, even if the storage is currently
occupied by an object of class type with a non-trivial destructor, the
storage can be "simply" reused (without undefined behaviour). The lifetime
of the previous object simply ends when the placement new occurs (more
wording below).

3.8 [basic.life] paragraph 1:

The lifetime of an object *o* of type T ends when [ ... ] the storage which
the object occupies is released, or is reused by an object that is not
nested within *o*.

So if you:

1. override operator delete to do nothing for that type (so that the
    placement new actually has unfreed backing storage to re-use).
2. have an empty destructor.

So,

void *operator new(decltype(sizeof 0), void *) noexcept;

struct MyInt {
  static void operator delete(void *) { }
  MyInt() { x = 0; }
  ~MyInt() { }
  int x;
};

float *qq;
void foo(void *p, void *q, long unk) {
  // unk = 1
  for (long i = 0; i < unk; ++i) {
    auto pp = new (p) MyInt;
    delete pp; // or pp->~MyInt();
    qq = new (static_cast<void *>(&reinterpret_cast<char *>(q)[i]))
float(42);
  }
}

then I suspect we'll have the same problem after the destructor and

the operator delete for that type has been inlined away.

Yes; the same problem occurs.

So, it sounds like a new intrinsic is in order?

What kind of intrinsic are you thinking about here?

What kind of intrinsic are you thinking about here?

I propose two intrinsics: llvm.lifetime.reuse and
llvm.tbaa.pointer_identity.barrier.
Both intrinsics take at least one argument: an address value to the
affected memory.
llvm.lifetime.reuse takes one other argument: the minimum size of the
affected memory.
Both intrinsics return an address value: some version of the value passed
in.

For llvm.tbaa.pointer_identity.barrier, the semantics are that the return
address value is distinct from the argument in such a way that it achieves
the necessary effect of std::launder with regards to TBAA.
Calls to llvm.tbaa.pointer_identity.barrier shall not be reordered with
respect to each other (and thus, also functions which may perform such a
call to llvm.tbaa.pointer_identity.barrier).
Note: it may be the case that, at this time,
llvm.tbaa.pointer_identity.barrier only maintains an ordering between uses
of its return value and the intrinsic itself.
Note: it may be possible to relax the reordering restriction based on the
storage reachable from the argument pointer.

For llvm.lifetime.reuse(p, sz) (placement new):
Acts as llvm.tbaa.pointer_identity.barrier(p), except that the reachable
storage contains at least [p, p + sz), with additional semantics as follows.

Conceptually, this kind of acts as (but is more prohibitive than) a store
of undef over at least sz bytes over the memory starting at p
(unfortunately, no TBAA information as to what it writes over is obvious
from the immediate context at the source level).

Assuming that the loads and stores discussed below are ones which may
access the affected memory:
This should mean that all loads observing previous stores should not,
relative to the intrinsic, move to occur later.
(All stores observed by said loads should occur before said load).

If an invented load of the affected memory immediately before the intrinsic
may observe a store, said store should should not, relative to the
intrinsic, move to occur later;
however, if the invented load is the only observation of said store, the
store should be removed as dead.

Thus, all accesses to a defined value of the old object must occur before
the llvm.lifetime.reuse intrinsic (even if it is the intrinsic that it
being moved).

By the language, either the old object type and the new object type already
alias, or all accesses to the new object must either use the return value
from llvm.lifetime.reuse, or the return value of a later
llvm.tbaa.pointer_identity_barrier.
Thus, llvm.lifetime.reuse acts to separate accesses of the old object from
accesses of the new object.