C++11 and enhacned devirtualization

>
> Hi everyone,
>
> C++11 added features that allow for certain parts of the class
hierarchy to be closed, specifically the 'final' keyword and the semantics
of anonymous namespaces, and I think we take advantage of these to enhance
our ability to perform devirtualization. For example, given this situation:
>
> struct Base {
> virtual void foo() = 0;
> };
>
> void external();
> struct Final final : Base {
> void foo() {
> external();
> }
> };
>
> void dispatch(Base *B) {
> B->foo();
> }
>
> void opportunity(Final *F) {
> dispatch(F);
> }
>
> When we optimize this code, we do the expected thing and inline
'dispatch' into 'opportunity' but we don't devirtualize the call to foo().
The fact that we know what the vtable of F is at that callsite is not
exploited. To a lesser extent, we can do similar things for final virtual
methods, and derived classes in anonymous namespaces (because Clang could
determine whether or not a class (or method) there is effectively final).
>
> One possibility might be to @llvm.assume to say something about what
the vtable ptr of F might be/contain should it be needed later when we emit
the initial IR for 'opportunity' (and then teach the optimizer to use that
information), but I'm not at all sure that's the best solution. Thoughts?

The problem with any sort of @llvm.assume-encoded information about
memory contents is that C++ does actually allow you to replace objects in
memory, up to and including stuff like:

{
  MyClass c;

  // Reuse the storage temporarily. UB to access the object through ‘c’
now.
  c.~MyClass();
  auto c2 = new (&c) MyOtherClass();

  // The storage has to contain a ‘MyClass’ when it goes out of scope.
  c2->~MyOtherClass();
  new (&c) MyClass();
}

The standard frontend devirtualization optimizations are permitted under
a couple of different language rules, specifically that:
1. If you access an object through an l-value of a type, it has to
dynamically be an object of that type (potentially a subobject).
2. Object replacement as above only “forwards” existing formal references
under specific conditions, e.g. the dynamic type has to be the same,
‘const’ members have to have the same value, etc. Using an unforwarded
reference (like the name of the local variable ‘c’ above) doesn’t formally
refer to a valid object and thus has undefined behavior.

You can apply those rules much more broadly than the frontend does, of
course; but those are the language tools you get.

Right. Our current plan for modelling this is:

1) Change the meaning of the existing !invariant.load metadata (or add
another parallel metadata kind) so that it allows load-load forwarding
(even if the memory is not known to be unmodified between the loads) if:

invariant.load currently allows the load to be reordered pretty
aggressively, so I think you need a new metadata.

Our thoughts were:
1) The existing !invariant.load is redundant because it's exactly
equivalent to a call to @llvm.invariant.start and a load.
2) The new semantics are a more strict form of the old semantics, so no
special action is required to upgrade old IR.
... so changing the meaning of the existing metadata seemed preferable to
adding a new, similar-but-not-quite-identical, form of the metadata. But
either way seems fine.

  a) both loads have !invariant.load metadata with the same operand, and
  b) the pointer operands are the same SSA value (being must-alias is not
sufficient)
2) Add a new intrinsic "i8* @llvm.invariant.barrier(i8*)" that produces a
new pointer that is different for the purpose of !invariant.load. (Some
other optimizations are permitted to look through the barrier.)

In particular, "new (&c) MyOtherClass()" would be emitted as something
like this:

  %1 = call @operator new(size, %c)
  %2 = call @llvm.invariant.barrier(%1)
  call @MyOtherClass::MyOtherClass(%2)
  %vptr = load %2
  %known.vptr = icmp eq %vptr, @MyOtherClass::vptr, !invariant.load
!MyBaseClass.vptr
  call @llvm.assume(%known.vptr)

Hmm. And all v-table loads have this invariant metadata?

That's the idea (but it's not essential that they do, we just lose
optimization power if not).

I am concerned about mixing files with and without barriers.

I think we'd need to always generate the barrier (even at -O0, to support
LTO between non-optimized and optimized code). I don't think we can support
LTO between IR using the metadata and old IR that didn't contain the
relevant barriers. How important is that use case? We were probably going
to put this behind a -fstrict-something flag, at least to start off with,
so we can create a transition period where we generate the barrier by
default but don't generate the metadata if necessary.

No, that would not be arbitrarily hoistable.

Well, all current IR does not contain the barrier, so this would be a statement that current C++ IR will never be correctly LTO-able with future C++ IR. That is generally something we try to avoid, yes.

If this goes into the function flags, perhaps there’s a way to prevent LTO between functions that disagree about the flag.

John.

From: "Nico Weber" <thakis@chromium.org>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "cfe-dev@cs.uiuc.edu Developers" <cfe-dev@cs.uiuc.edu>, "Richard Smith" <richard@metafoo.co.uk>
Sent: Thursday, July 16, 2015 2:52:46 PM
Subject: Re: [cfe-dev] C++11 and enhacned devirtualization

Hi everyone,

C++11 added features that allow for certain parts of the class
hierarchy to be closed, specifically the 'final' keyword and the
semantics of anonymous namespaces, and I think we take advantage of
these to enhance our ability to perform devirtualization. For
example, given this situation:

struct Base {
virtual void foo() = 0;
};

void external();
struct Final final : Base {
void foo() {
external();
}
};

void dispatch(Base *B) {
B->foo();
}

void opportunity(Final *F) {
dispatch(F);
}

When we optimize this code, we do the expected thing and inline
'dispatch' into 'opportunity' but we don't devirtualize the call to
foo(). The fact that we know what the vtable of F is at that
callsite is not exploited. To a lesser extent, we can do similar
things for final virtual methods, and derived classes in anonymous
namespaces (because Clang could determine whether or not a class (or
method) there is effectively final).

Out of interest, is there data to suggest that this type of
optimization will happen often in practice?

Speaking only for myself, most of my users are just now beginning the transition to C++11 and beyond, and have started asking questions about how various new language features help, or not, with optimization. The answers to these questions will determine how these features will be adopted going forward. In short, I don't have a good answer to your question, but if adding 'final' will help with optimization, then going forward, I'll see a lot more of it.

-Hal

OK, that seems doable.

Hi everyone,

C++11 added features that allow for certain parts of the class hierarchy
to be closed, specifically the 'final' keyword and the semantics of
anonymous namespaces, and I think we take advantage of these to enhance our
ability to perform devirtualization. For example, given this situation:

struct Base {
  virtual void foo() = 0;
};

void external();
struct Final final : Base {
  void foo() {
    external();
  }
};

void dispatch(Base *B) {
  B->foo();
}

void opportunity(Final *F) {
  dispatch(F);
}

When we optimize this code, we do the expected thing and inline
'dispatch' into 'opportunity' but we don't devirtualize the call to foo().
The fact that we know what the vtable of F is at that callsite is not
exploited. To a lesser extent, we can do similar things for final virtual
methods, and derived classes in anonymous namespaces (because Clang could
determine whether or not a class (or method) there is effectively final).

Out of interest, is there data to suggest that this type of optimization
will happen often in practice?

To follow up on Hal's response, one company I know of having about 20 C++
developers have been using C++11 for the past couple of years and have
upgraded the compiler being used in the past 10 months for optimization
purposes. The 'final' keyword was used a lot and requested to be used in
code reviews in anticipation of getting this optimization for free when
upgrading compiler versions. A bit of a small sample size, but some other
financial companies I know of like people who have knowledge of C++11/14
and optimizations like this that follow.

I’m going to argue pretty strongly in favour of the new form of metadata. We’ve spent a lot of time getting !invariant.load working well for use cases like the “length” field in a Java array and I’d really hate to give that up. (One way of framing this is that the current !invariant.load gives a guarantee that there can’t be a @llvm.invariant.end call anywhere in the program and that any @llvm.invariant.start occurs outside the visible scope of the compilation unit (Module, LTO, what have you) and must have executed before any code contained in said module which can describe the memory location can execute. FYI, that last bit of strange wording is to allow initialization inside a malloc like function which returns a noalias pointer.) I’m definitely open to working together on a revised version of a more general invariant mechanism. In particular, we don’t have a good way of modelling Java’s “final” fields* in the IR today since the initialization logic may be visible to the compiler. Coming up with something which supports both use cases would be really useful. * Let’s ignore the fact that few Java final fields are actually final. That part of the problem is decidedly out of scope for LLVM. :slight_smile:

>
> Hi everyone,
>
> C++11 added features that allow for certain parts of the class
hierarchy to be closed, specifically the 'final' keyword and the semantics
of anonymous namespaces, and I think we take advantage of these to enhance
our ability to perform devirtualization. For example, given this situation:
>
> struct Base {
> virtual void foo() = 0;
> };
>
> void external();
> struct Final final : Base {
> void foo() {
> external();
> }
> };
>
> void dispatch(Base *B) {
> B->foo();
> }
>
> void opportunity(Final *F) {
> dispatch(F);
> }
>
> When we optimize this code, we do the expected thing and inline
'dispatch' into 'opportunity' but we don't devirtualize the call to foo().
The fact that we know what the vtable of F is at that callsite is not
exploited. To a lesser extent, we can do similar things for final virtual
methods, and derived classes in anonymous namespaces (because Clang could
determine whether or not a class (or method) there is effectively final).
>
> One possibility might be to @llvm.assume to say something about what
the vtable ptr of F might be/contain should it be needed later when we emit
the initial IR for 'opportunity' (and then teach the optimizer to use that
information), but I'm not at all sure that's the best solution. Thoughts?

The problem with any sort of @llvm.assume-encoded information about
memory contents is that C++ does actually allow you to replace objects in
memory, up to and including stuff like:

{
  MyClass c;

  // Reuse the storage temporarily. UB to access the object through ‘c’
now.
  c.~MyClass();
  auto c2 = new (&c) MyOtherClass();

  // The storage has to contain a ‘MyClass’ when it goes out of scope.
  c2->~MyOtherClass();
  new (&c) MyClass();
}

The standard frontend devirtualization optimizations are permitted under
a couple of different language rules, specifically that:
1. If you access an object through an l-value of a type, it has to
dynamically be an object of that type (potentially a subobject).
2. Object replacement as above only “forwards” existing formal
references under specific conditions, e.g. the dynamic type has to be the
same, ‘const’ members have to have the same value, etc. Using an
unforwarded reference (like the name of the local variable ‘c’ above)
doesn’t formally refer to a valid object and thus has undefined behavior.

You can apply those rules much more broadly than the frontend does, of
course; but those are the language tools you get.

Right. Our current plan for modelling this is:

1) Change the meaning of the existing !invariant.load metadata (or add
another parallel metadata kind) so that it allows load-load forwarding
(even if the memory is not known to be unmodified between the loads) if:

  invariant.load currently allows the load to be reordered pretty
aggressively, so I think you need a new metadata.

Our thoughts were:
1) The existing !invariant.load is redundant because it's exactly
equivalent to a call to @llvm.invariant.start and a load.
2) The new semantics are a more strict form of the old semantics, so no
special action is required to upgrade old IR.
... so changing the meaning of the existing metadata seemed preferable to
adding a new, similar-but-not-quite-identical, form of the metadata. But
either way seems fine.

I'm going to argue pretty strongly in favour of the new form of metadata.
We've spent a lot of time getting !invariant.load working well for use
cases like the "length" field in a Java array and I'd really hate to give
that up.

(One way of framing this is that the current !invariant.load gives a
guarantee that there can't be a @llvm.invariant.end call anywhere in the
program and that any @llvm.invariant.start occurs outside the visible scope
of the compilation unit (Module, LTO, what have you) and must have executed
before any code contained in said module which can describe the memory
location can execute. FYI, that last bit of strange wording is to allow
initialization inside a malloc like function which returns a noalias
pointer.)

I had overlooked that !invariant.load also applies for loads /before/ the
invariant load. I agree that this is different both from what we're
proposing and from what you can achieve with @llvm.invariant.start. I would
expect that you can use our metadata for the length in a Java array -- it
seems like it'd be straightforward for you to arrange that all loads of the
array field have the metadata (and that you use the same operand on all of
them) -- but there's no real motivation behind reusing the existing
metadata besides simplicity and cleanliness.

I'm definitely open to working together on a revised version of a more

general invariant mechanism. In particular, we don't have a good way of
modelling Java's "final" fields* in the IR today since the initialization
logic may be visible to the compiler. Coming up with something which
supports both use cases would be really useful.

This seems like something that our proposed mechanism may be able to
support; we intend to use it for const and reference data members in C++,
though the semantics of those are not quite the same.

* Let's ignore the fact that few Java final fields are actually final.

ObjC (and Swift, and probably a number of other languages) has a optimization opportunity where there’s a global variable that’s known to be constant after its initialization. (For the initiated, I’m talking here primarily about ivar offset variables.) However, that initialization is run lazily, and it’s only at specific points within the program that we can guarantee that it’s already been performed. (Namely, before ivar accesses or after message sends to the class (but not to instances, because of nil).) Those points usually guarantee the initialization of more than one variable, and contrariwise, there are often several such points that would each individually suffice to establish the guarantee for a particular load, allowing it to be hoisted/reordered/combined at will.

So e.g.

if (cond) {
// Here there’s an operation that proves to us that A, B, and C are initialized.
} else {
// Here there’s an operation that proves it for just A and B.
}

for (;:wink: {
// Here we load A. This should be hoist able out of this loop, independently of whatever else happens in this loop.
}

This is actually the situation where ObjC currently uses !invariant.load, except that we can only safely use it in specific functions (ObjC method implementations) that guarantee initialization before entry and which can never be inlined.

Now, I think something like invariant.start would help with this, except that I’m concerned that we’d have to eagerly emit what might be dozens of invariant.starts at every point that established the guarantee, which would be pretty wasteful even for optimized builds. If we’re designing new metadata anyway, or generalizing existing metadata, can we try to make this more scalable, so that e.g. I can use a single intrinsic with a list of the invariants it establishes, ideally in a way that’s sharable between calls?

John.

It seems we have three different use cases:

1) This invariant applies to this load and all future loads of this pointer
(ObjC / Swift constants, Java final members)
2) This invariant applies to this load and all past and future loads of
this pointer (Java array length)
3) This invariant applies to this load and all (past and) future loads of
this pointer that also have metadata with the same operand (C++ vptr, const
members, reference members)

We could use (1) or (2) for C++, but that would require us to insert a lot
more invariant barriers (for all dynamic type changes, not just for a
change to a type with a constant member), and would be much less forgiving
of user errors. How general a system are you imagining?

Slight tweak here: it’s not “all past” loads. It’s all (past, present, future) instruction positions at which this location was/is dereferenceable. There doesn’t actually have to be a load there (yet).

It’s specifically that property which allows aggressive hoisting (e.g. in LICM).

(I’m assuming that by “this pointer” you actually mean “this abstract memory location”.)

For my use cases, I can replace (1) with (3) as long as I give all such loads the same operand, I still need (2) as a distinct notion though. Definition (1) and (3) don’t include the ability to do aggressive hoisting. If we could define it in a way it did, we might be able to combine all three.

(2) today works specifically because there is no such thing as a start/end for the invariantness. If there were, we’d have to solve a potentially hard IPO problem for basic hoisting in LICM.

I feel we’re going to end up needing something which supports the following combinations:

  • No start, no end (2 above)
  • Possible start, no end (1 above)
  • Possible start, possible end, possible start, possible end, … (what @llvm.invariant.start/end try to be and fail at)

(Note that for all of these, dereferenceability is orthogonal to the invariantness of the memory location.)

Whether those are all the same mechanism or not, who knows.

To throw yet another wrinkle in, the backend has a separate notion of invarantness. It assumes (2), plus derefenceability within the entire function. It’s for this reason that many !invariant.loads aren’t marked invariant in the backend.

p.s. Before this goes too much further, we should move this to it’s own thread, and CC llvmdev rather than just cfe-dev. Many interested folks (i.e. other frontends) might not be subscribed to cfe-dev.

Philip

>
> Hi everyone,
>
> C++11 added features that allow for certain parts of the class
hierarchy to be closed, specifically the 'final' keyword and the semantics
of anonymous namespaces, and I think we take advantage of these to enhance
our ability to perform devirtualization. For example, given this situation:
>
> struct Base {
> virtual void foo() = 0;
> };
>
> void external();
> struct Final final : Base {
> void foo() {
> external();
> }
> };
>
> void dispatch(Base *B) {
> B->foo();
> }
>
> void opportunity(Final *F) {
> dispatch(F);
> }
>
> When we optimize this code, we do the expected thing and inline
'dispatch' into 'opportunity' but we don't devirtualize the call to foo().
The fact that we know what the vtable of F is at that callsite is not
exploited. To a lesser extent, we can do similar things for final virtual
methods, and derived classes in anonymous namespaces (because Clang could
determine whether or not a class (or method) there is effectively final).
>
> One possibility might be to @llvm.assume to say something about what
the vtable ptr of F might be/contain should it be needed later when we emit
the initial IR for 'opportunity' (and then teach the optimizer to use that
information), but I'm not at all sure that's the best solution. Thoughts?

The problem with any sort of @llvm.assume-encoded information about
memory contents is that C++ does actually allow you to replace objects in
memory, up to and including stuff like:

{
  MyClass c;

  // Reuse the storage temporarily. UB to access the object through
‘c’ now.
  c.~MyClass();
  auto c2 = new (&c) MyOtherClass();

  // The storage has to contain a ‘MyClass’ when it goes out of scope.
  c2->~MyOtherClass();
  new (&c) MyClass();
}

The standard frontend devirtualization optimizations are permitted
under a couple of different language rules, specifically that:
1. If you access an object through an l-value of a type, it has to
dynamically be an object of that type (potentially a subobject).
2. Object replacement as above only “forwards” existing formal
references under specific conditions, e.g. the dynamic type has to be the
same, ‘const’ members have to have the same value, etc. Using an
unforwarded reference (like the name of the local variable ‘c’ above)
doesn’t formally refer to a valid object and thus has undefined behavior.

You can apply those rules much more broadly than the frontend does, of
course; but those are the language tools you get.

Right. Our current plan for modelling this is:

1) Change the meaning of the existing !invariant.load metadata (or
add another parallel metadata kind) so that it allows load-load forwarding
(even if the memory is not known to be unmodified between the loads) if:

  invariant.load currently allows the load to be reordered pretty
aggressively, so I think you need a new metadata.

Our thoughts were:
1) The existing !invariant.load is redundant because it's exactly
equivalent to a call to @llvm.invariant.start and a load.
2) The new semantics are a more strict form of the old semantics, so no
special action is required to upgrade old IR.
... so changing the meaning of the existing metadata seemed preferable
to adding a new, similar-but-not-quite-identical, form of the metadata. But
either way seems fine.

I'm going to argue pretty strongly in favour of the new form of
metadata. We've spent a lot of time getting !invariant.load working well
for use cases like the "length" field in a Java array and I'd really hate
to give that up.

(One way of framing this is that the current !invariant.load gives a
guarantee that there can't be a @llvm.invariant.end call anywhere in the
program and that any @llvm.invariant.start occurs outside the visible scope
of the compilation unit (Module, LTO, what have you) and must have executed
before any code contained in said module which can describe the memory
location can execute. FYI, that last bit of strange wording is to allow
initialization inside a malloc like function which returns a noalias
pointer.)

I had overlooked that !invariant.load also applies for loads /before/
the invariant load. I agree that this is different both from what we're
proposing and from what you can achieve with @llvm.invariant.start. I would
expect that you can use our metadata for the length in a Java array -- it
seems like it'd be straightforward for you to arrange that all loads of the
array field have the metadata (and that you use the same operand on all of
them) -- but there's no real motivation behind reusing the existing
metadata besides simplicity and cleanliness.

  I'm definitely open to working together on a revised version of a more

general invariant mechanism. In particular, we don't have a good way of
modelling Java's "final" fields* in the IR today since the initialization
logic may be visible to the compiler. Coming up with something which
supports both use cases would be really useful.

This seems like something that our proposed mechanism may be able to
support; we intend to use it for const and reference data members in C++,
though the semantics of those are not quite the same.

  ObjC (and Swift, and probably a number of other languages) has a
optimization opportunity where there’s a global variable that’s known to be
constant after its initialization. (For the initiated, I’m talking here
primarily about ivar offset variables.) However, that initialization is
run lazily, and it’s only at specific points within the program that we can
guarantee that it’s already been performed. (Namely, before ivar accesses
or after message sends to the class (but not to instances, because of
nil).) Those points usually guarantee the initialization of more than one
variable, and contrariwise, there are often several such points that would
each individually suffice to establish the guarantee for a particular load,
allowing it to be hoisted/reordered/combined at will.

So e.g.

   if (cond) {
    // Here there’s an operation that proves to us that A, B, and C are
initialized.
  } else {
    // Here there’s an operation that proves it for just A and B.
  }

   for (;:wink: {
    // Here we load A. This should be hoist able out of this loop,
independently of whatever else happens in this loop.
  }

This is actually the situation where ObjC currently uses
!invariant.load, except that we can only safely use it in specific
functions (ObjC method implementations) that guarantee initialization
before entry and which can never be inlined.

Now, I think something like invariant.start would help with this,
except that I’m concerned that we’d have to eagerly emit what might be
dozens of invariant.starts at every point that established the guarantee,
which would be pretty wasteful even for optimized builds. If we’re
designing new metadata anyway, or generalizing existing metadata, can we
try to make this more scalable, so that e.g. I can use a single intrinsic
with a list of the invariants it establishes, ideally in a way that’s
sharable between calls?

It seems we have three different use cases:

1) This invariant applies to this load and all future loads of this
pointer (ObjC / Swift constants, Java final members)
2) This invariant applies to this load and all past and future loads of
this pointer (Java array length)

Slight tweak here: it's not "all past" *loads*. It's all (past, present,
future) instruction *positions* at which this location was/is
dereferenceable. There doesn't actually have to be a load there (yet).

It's specifically that property which allows aggressive hoisting (e.g. in
LICM).

  3) This invariant applies to this load and all (past and) future loads
of this pointer that also have metadata with the same operand (C++ vptr,
const members, reference members)

(I'm assuming that by "this pointer" you actually mean "this abstract
memory location".)

No; that would remove the ability to have something like
@llvm.invariant.barrier. This is a property of the pointer value, not a
property of the storage location. In this regard, it seems fundamentally
different to the existing !invariant.load.

For my use cases, I can replace (1) with (3) as long as I give all such

loads the same operand, I still need (2) as a distinct notion though.
Definition (1) and (3) don't include the ability to do aggressive
hoisting. If we could define it in a way it did, we might be able to
combine all three.

We could use (1) or (2) for C++, but that would require us to insert a
lot more invariant barriers (for all dynamic type changes, not just for a
change to a type with a constant member), and would be much less forgiving
of user errors. How general a system are you imagining?

(2) today works specifically because there is no such thing as a start/end
for the invariantness. If there were, we'd have to solve a potentially
hard IPO problem for basic hoisting in LICM.

I feel we're going to end up needing something which supports the
following combinations:
- No start, no end (2 above)
- Possible start, no end (1 above)
- Possible start, possible end, possible start, possible end, .... (what
@llvm.invariant.start/end try to be and fail at)

(Note that for all of these, dereferenceability is orthogonal to the
invariantness of the memory location.)

Whether those are all the same mechanism or not, who knows.

To throw yet another wrinkle in, the backend has a separate notion of
invarantness. It assumes (2), plus derefenceability within the *entire*
function. It's for this reason that many !invariant.loads aren't marked
invariant in the backend.

p.s. Before this goes too much further, we should move this to it's own
thread, and CC llvmdev rather than just cfe-dev. Many interested folks
(i.e. other frontends) might not be subscribed to cfe-dev.

A proper proposal is on the way (which will get its own thread on llvmdev);
we were only responding here because Hal happened to bring up the topic
before we were ready with our proposal :slight_smile:

Hmm. I’m not really seeing what you’re saying about past and future. The difference is that the invariant holds, but only after a certain point; reordering the load earlier across side-effects etc. is fine (great, even), it just can’t cross a particular line.

I assume the only reason that bounding the invariant isn’t important for Philip’s array-length is that the initialization is done opaquely, so that the optimizer can’t see the pointer until it’s been properly initialized. That is, the bound is enforced by SSA.

I think the real difference here is whether SSA value identity can give us sufficient information or not.

For C++ vtables and const/reference members, it does, because the semantics are dependent on “blessed” object references (the result of the new-expression, the name of the local variable, etc.) which mostly have undefined behavior if re-written. Maybe you need to make some effort to not have GEPs down to base classes break the optimization, but that’s probably easy.

For the Java cases, it still does, because the object becomes immutable past a certain point (the return from the new operation), which also narrows most/all of the subsequent accesses to a single value. So you can bless the object at that point; sure, you theoretically give up some optimization opportunities if ‘this’ was stored aside during construction, but it’s not a big deal.

For my ObjC/Swift cases, it’s not good enough, because the addresses that become immutable are global. Each load is going to re-derive the address from the global, so no amount of SSA value propagation is going to help; thus the barrier has to be more like a memory barrier than just an SSA barrier. (The biggest distinguishing feature from actual atomic memory barriers is that dominating barriers trivialize later barriers, regardless of what happens in between.)

(Of course, Swift has situations more like the C++ and Java opportunities, too.)

So maybe these are really different problems and there’s no useful shared generalization.

John.

I'm increasingly thinking that's the case; we now intend to propose adding
new metadata rather than extending / generalizing !invariant.load.

Just to not lose track, I started new thread with our proposal:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-July/044246.html

Piotr