RFC: Strong GC References in LLVM

This is a proposal to add strong GC reference types to LLVM.

We have some local (downstream) patches that are needed to prevent
LLVM's optimizer from making transforms that are problematic in the
presence of a precise relocating GC. Adding a notion of a strong GC
reference to LLVM will let us upstream these patches in a principled
manner, and will act as a measure to avoid new problematic transforms.

The word "strong" here is used as opposed to the weaker pointers that
LLVM currently has which can be converted to and from integers without
repercussions (i.e. inttoptr and ptrtoint are nop-like non-sideeffecting
instructions).

Let me emphasize first that this email is intended to *start* a
conversation; and all of the points made here are open for discussion
and debate. This proposal is informed by only _our_ GC and runtime
and other runtimes may have different requirements.

# Preamble & "Problem Statement"

A quick introduction to some GC basics: a precise relocating garbage
collector needs co-operation from the code generator / compiler to be
able to scan and relocate objects pointed to from frame local state
like CPU registers and stack slots. Specifically, it needs a list of
the locations of all the GC references present in a stack frame; and
at "safepoints" it stops "mutator" threads (via some mechanism not
pertinent here), traverses all of the stack frames and uses the
aforementioned list to scan and potentially rewrite those references
to point to the new locations for the same objects. In addition, some
garbage collectors need read and write barriers -- these are small
bits of code that run when reading and writing GC references (distinct
from memory reordering barriers). For more details on GC specific
stuff, I suggest referring to http://www.memorymanagement.org/ and our
2014 LLVM-Dev talk http://llvm.org/devmtg/2014-10/#talk4.

Getting LLVM ToT working with a precise relocating GC involves taking
the following steps:

    Note: GCREF is the type representing a pointer that the GC needs
      to know about and potentially relocate. Currently it is `<Ty>
      addrspace(1)*`.

1. Generate IR, and represent GC pointer SSA values as values of type
    GCREF.
2. Run opt -O3.
3. Run RewriteStatepointForGC. It rewrites IR values of type GCREF
    to "normal" pointers that the backend can lower (note -- we don't
    physically rewrite the llvm::Types to keep things simple and fast). It
    does this by making all of the the data flow required to express
    relocation semantics explicit in the IR.
4. Rewrite loads and stores of GCREF type to execute load and store
    barriers (logically can be part of (3)).
5. Lower into machine code.

For this discussion let's treat the internals of (3), (4) and (5) as
black boxes, and center our discussion on what properties we require
of the IR after step (2) so that step (3), (4) and (5) can do their
job. However, if you think the overall design can be simplified by
changing (3), (4) or (5), please feel free to point that out. :slight_smile:

After (2) and before (3), we want to have the following invariant:

'''
Given any location "L" in the CFG of a function, let the set of all
values of type GCREF whose defs dominate "L" be "G":

  A. "G" is a superset of the references that need to be reported to
     the GC if we put a safepoint at "L". References need to be
     reported to the GC if the GC may want to scan through them or
     rewrite them to point to the new locations of the objects they
     used to point to.

  B. Everything in "G" is a valid GC reference. This directly implies
     that all values of type GCREF hold a valid GC reference (since
     "L" could be the location immediately following the def).
'''

Few important things to note about "valid GC reference":

1. There is no fixed notion of a valid GC reference. What
    constitutes a valid GC reference can change at arbitrary points in
    time.

2. There may not be a runtime bitwise predicate that can reliably
    distinguish a "valid" GC reference from an invalid one. The
    validity of a GC reference is a function of its provenance as well
    as its bitwise content.

3. (Related to the previous point) Arbitrary GEPs off of "valid GC
    references" are also "valid GC references". This includes out of
    bounds GEPs (whose bitwise representation can have any value, for
    all practical intents and purposes).

[Aside: We could phrase the invariant in terms of fixed safepoint
locations, but I think focusing on this (stronger) invariant actually
makes the problem simpler (and no harder).]

The invariant is somewhat vague and relies on white-box knowledge of
the GC to define what values "need to be reported to the GC" (this
will become clearer as we see some examples); and the "problem
statement" is to come up with a precise set of rules governing the
GCREF type so that the invariants we need fall out. Today we enforce
the invariants downstream by `#if 0` ing out code that breaks it.
However that's fragile; and more importantly, it does not help other
people interested in precise GC and LLVM.

# Examples of Bad Transforms

Let's look at some some examples of bad transforms that are legal
today (in upstream LLVM) with GCREF == "addrspace(1)*". Right now
we'll justify their "badness" by invoking ad-hoc GC specific rules on
a case by case basis:

## Integer -> GCREF Conversion

(%value is unused)

%value = load i64, i64* %ptr
== Transformed to ==>
%value = load GCREF, GCREF* (bitcast %ptr)

This is a problem since we've introduced a value of type GCREF that
may not be a valid GC reference at runtime. If the compiler can prove
that %value is actually a valid GC reference then this transform is
valid, but typically it won't be able to except in cases like "%value
is provably 0" in which case it is better to just fold the load away.

%value = load i64, i64* %ptr
== Transformed to ==>
%value = load i64, i64* %ptr
%value_gc = inttoptr %value to GCREF (%value_gc unused)

is invalid for the same reason.

## GCREF -> Integer conversion

Converting a GCREF to an integer is fine if the integer is unused.
However, cases like these are problematic:

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_1
%cmp = icmp eq %v0, %v1
== Transformed to ==>
%v0 = load i64, i64* %ptr_0
%v1 = load i64, i64* %ptr_1
%cmp = icmp eq %v0, %v1

Since in the second version if "L" is, say, between %v0 and %v1, "G"
would not include "v0" which contains a GC reference that the GC may
want to scan or relocate[0].

For similar reasons

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_0
%cmp = icmp eq %v0, %v1
== Transformed to ==>
%v0 = load GCREF, GCREF* %ptr_0
%v0.i = ptrtoint %v0
%v1 = load GCREF, GCREF* %ptr_0
%v1.i = ptrtoint %v1
%cmp = icmp eq %v0.i, %v1.i

is also invalid.

## Round tripping GCREF's through Integers

The following transform is invalid:

%v0 = load GCREF, GCREF* %loc0
store GCREF %v0, GCREF* %loc1
== Transformed to ==>
%v0 = load i64, i64* (bitcast %loc0)
store i64 %v0, i64* (bitcast %loc1)

because if "L" in the second version is between the load and the
store, "G" will not include `%v0` even though we need to report it to
the GC (otherwise we could propagate a stale bitwise representation of
the GC reference we loaded in `%v0`).

## Bad types due to hoisting loads out of control flow

This is bad

if (is_reference) {
  %val = load GCREF, GCREF* %ptr
}
== Transformed to ==>
%val = load GCREF, GCREF* %ptr
if (is_reference) {
}

unless the compiler can prove that %val is a GC reference at its new
location. Downstream we model the Java type system in LLVM IR to try
to prove that %val is a GC reference in its new location, but for
starters we will have to be conservative upstream.

# Proposed Solution:

We introduce a "new" LLVM type. I will still refer to it as GCREF
here, but it may actually still be "<ty> addrspace(k)*" where k is
specially noted in the datalayout.

Semantics:

1. GCREF represents an equivalence class of values (equivalence
    relation being "points to a fixed semantic object"). The bitwise
    representation fluctuates constantly outside the compiler's
    control (the dual of `undef`), but may have invariants (in
    particular, we'd like to be able to specify alignment, nonnull
    etc.). At any given point in time all GCREF instances pointing to
    the same object have the same bitwise representation (we need this
    to make `icmp eq` is well-defined).

2. GCREF instances can only contain a valid gc reference (otherwise
    they can't meaningfully "fluctuate" among the various possible
    bitwise representations of a reference).

3. Converting GCREF to integers is fine in general, but you'll get an
    arbitrary "snapshot" of the bitwise value that will generally not
    be meaningful (unless you are colluding with the GC in
    implementation defined ways).

4. Converting integers to GCREF is allowed only if source integer is
    a bitwise representation of a valid GC reference that is not an
    out of bounds derived reference. However, this is difficult for
    the compiler to infer since it typically will have no fundamental
    knowledge of what bitwise representation can be a valid GC
    reference.

5. Operations that use a GCREF-typed value are "atomic" in using the
    bitwise representation, i.e., loading from a GCREF typed value
    does not "internally" convert the GCREF to a normal
    integer-pointer and then use the integer-pointer, since that would
    mean there is a window in which the integer-pointer can become
    stale[1].

6. A GCREF stored to a location in the heap continues to fluctuate,
    and keeps itself in sync with the right bitwise representation.
    In a way, there isn't a large distinction between the GC and the
    heap -- the heap is part of (or managed by) the GC.

I think (6) is the most controversial of the semantics above, but it
isn't very different from how `undef` stored to the heap remains
`undef` (i.e. a non-deterministic N-bit value) and a later load can
recover `undef` instead of getting a normal N-bit value.

## How do we fare with the examples?

Let's take a look some of the examples, and see how the semantics
specified above help us:

### Demoting loads / stores from GCREF to integers is invalid (case 1):

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_1
%cmp = icmp eq %v0, %v1
==>
%v0 = load i64, i64* %ptr_0
%v1 = load i64, i64* %ptr_1
%cmp = icmp eq %v0, %v1

Invalid because the initial IR is comparing two GC fluctuating
references, while the the transformed IR will compare their bitwise
snapshot taken at an arbitrary point in time.

### Demoting loads / stores from GCREF to integers is invalid (case 2):

%v0 = load GCREF, GCREF* %loc0
store GCREF %v0, GCREF* %loc1
==>
%v0 = load i64, i64* (bitcast %loc0)
store i64 %v0, i64* (bitcast %loc1)

Invalid because there are two semantic differences between the initial
and final IR:

1. In the initial IR we store a fluctuating and up-to-date GCREF,
    whereas in the final IR we store a potentially stale bitwise
    snapshot of the value.
2. In the initial IR, after the store *%loc1 has a fluctuating GC
    reference, while in the final IR *%loc1 only has a once-valid
    snapshot of a GC reference (that will no longer fluctuate).

### Basic store forwarding is valid:

store GCREF %v0, GCREF* %loc0
%v1 = load GCREF, GCREF* %loc0
==>
store GCREF %v0, GCREF* %loc0
%v1 = %v0

The store forwarding is valid since we stored a GCREF that
semantically points to object O (say), and loaded back a GCREF
pointing to the same object O.

### Hoisting inttoptr is (generally) invalid:

  if (<condition>) {
    %val_gc = ptrtoint %val to GCREF
  }
==>
  %val_gc = ptrtoint %val to GCREF
  if (<condition>) {}

is invalid since we do not know that `%val` is a valid bitwise
representation of a GCREF at the point in time we do the `ptrtoint`
(i.e. <<condition>> could be controlling whether `%val` is a valid
bitwise representation of a GCREF).

In theory this isn't different from loads, divisions or any other
potentially trapping operation, but I think a lot of LLVM directly
encodes the fact that `CastInst` s are obviously speculatable, so
maybe Integer -> GCREF casts should be outlawed at the type system
level, and be done via an intrinsic?

### Store forwarding between integers and GCREFs is valid

store i64 %val, i64* %loc
%val = load GCREF, GCREF* (bitcast %loc)
==>
store i64 %val, i64* %loc
%val = inttoptr %val to GCREF

is valid.

Both the initial and the final program assume that %val is a valid
bitwise representation of a GC reference.

(Note: it may not be valid to speculate the inttoptr instruction above
control flow).

### Load forwarding between integers and GCREFs is invalid

%val_i = load i64, i64* %loc
%val_p = load GCREF, GCREF* (bitcast %loc)
==>
%val_i = load i64, i64* %loc
%val_p = inttoptr %val_i to GCREF

is invalid if *%loc contains a GC reference. Given the model that we
have, the first program loads some bitwise representation of
fluctuating a GC reference, and then loads the same GC reference as a
GCREF; whereas the second program tries to convert a potentially stale
bitwise representation of a GC reference into a fluctuating GCREF
(which is not allowed).

## Implementation Issues & Impact to LLVM

This is a proposed path forward to implementing GCREF in LLVM. The
caveat mentioned at the start of the email (everything is open for
discussion) especially applies here.

### Changes to IR to represent GCREF

We change `datalayout` to denote pointers of a specific address space
as GCREFs. We add verifier rules forbidding `inttoptr` conversion of
integers into GCREFs and add an intrinsic (that can be control
dependent) that converts integers to GCREFs.

### Changes to the optimizer

We change the optimizer to

- Not speculate loads of GCREF type
- Not change uses of GCREF types to uses of non-GCREF types

Locally we already have changes fixing most of the places that'd need
to be fixed for the above to hold.

### Lowering

We do not teach LLC about GCREF's at all (in fact, we can change it to
assert that the datalayout does not contain a `gc` directive). We
make the RewriteStatepointForGC pass rewrite the IR to make
relocations explicit (so that no more GCREF-like semantics are
necessary) and remove the GC directive from the Module's datalayout.

Thoughts?

-- Sanjoy

Footnotes:

[0]:
To see why this is bad assume to begin with %ptr_0 and %ptr_1 both
contain a GC reference pointing to object X, currently at address
0x500.

Now the following happens

%v0 = load i64, i64* %ptr_0
%v0 is 0x500
<< safepoint >> GC moves X to 0x600, and update *%ptr_1
%v1 = load i64, i64* %ptr_1
%v1 = 0x600
%cmp = icmp eq %v0, %v1
%cmp = false, when it should be true

If %v0 was "reported to the GC" in <<safepoint>> then it would have
been updated in place to 0x600, and %cmp would evaluate to true, as
expected.

[1]:
In reality, this is handled by _not_ putting safepoints in problematic
locations. For instance, if lowering a CompareAndSwap in some weird
architecture would involve doing:

  // CAS(From, To, Location)
  Location ^= << Constant >>

  // The GC cannot parse the state at this location "L", since we've
  // done some arbitrary arithmetic on Location.

  ... Do some magic with Location, To, From

then we _cannot_ put a safepoint at location "L".

ping!

Sanjoy Das wrote:

My high-level comment is that I’m not really sure we need a new type here. I’m curious whether we can make non-zero-address-space pointers have the semantics you need in a conservative model.

The reason I say this is that the requirements you have here actually are very similar to what I would expect other users of custom address spaces to want. While they (and you probably) would have domain specific optimizations you would like to perform, the generic optimizer needs to keep its hands off to avoid breaking invariants. Specifically looking at your examples:

Examples of Bad Transforms

Integer → GCREF Conversion

(%value is unused)

%value = load i64, i64* %ptr
== Transformed to ==>
%value = load GCREF, GCREF* (bitcast %ptr)

I think this should definitely be invalid for non-zero address spaces. Getting a non-zero address space out-of-thin-air is nuts (provided its not a CSE-based transform that proves equivalence as you indicate in your more detailed writeup).

%value = load i64, i64* %ptr
== Transformed to ==>
%value = load i64, i64* %ptr
%value_gc = inttoptr %value to GCREF (%value_gc unused)

And here as well. We shouldn’t (IMO) be adding out-of-thin-air address spaces, and definitely not inttoptr operations on them!!!

GCREF → Integer conversion

I thought non-zero address spaces already made this flat out impossible? THis is really not OK for many users of address spaces…

Round tripping GCREF’s through Integers

Same as above. Many non-zero address spaces don’t have integer representations that are meaningful I thought? This doesn’t seem an unreasonable requirement to me honestly…

Bad types due to hoisting loads out of control flow

This is bad

if (is_reference) {
%val = load GCREF, GCREF* %ptr
}
== Transformed to ==>
%val = load GCREF, GCREF* %ptr
if (is_reference) {
}

unless the compiler can prove that %val is a GC reference at its new
location. Downstream we model the Java type system in LLVM IR to try
to prove that %val is a GC reference in its new location, but for
starters we will have to be conservative upstream.

I think this is the interesting thing. It is essentially saying that loads of a non-zero address space pointer are control dependent which I don’t think is necessarily true with our current definitions of address spaces, but I think this might be a very reasonable desire.

If anything, we might want something in the datalayout that identifies whether particular address spaces can be speculatively loaded from. This is not terribly dissimilar from the memory scopes proposal which essentially wants to provide some encoding of basic transformations to non-zero address spaces that are allowed while keeping everything else conservative by default.

Personally, I’d be happy to make the default even more conservative to address the GC use case and add a mechanism to opt out of that for GPUs and other users that want very generic transforms to continue applying.

So I’m curious whether this seems reasonable to you (and others). To summarize: make address spaces sufficiently conservative by default to satisfy the requirements for GC pointers, and add “opt-in” mechanisms for generic transforms that existing users of non-zero address spaces actually desire.

Alternatively, I’m interested in any examples that more firmly illustrate the reasons why address spaces are fundamentally the wrong model for GC pointers.

-Chandler

ping!

Sanjoy Das wrote:

# Proposed Solution:

We introduce a "new" LLVM type. I will still refer to it as GCREF
here, but it may actually still be "<ty> addrspace(k)*" where k is
specially noted in the datalayout.

Semantics:

  1. GCREF represents an equivalence class of values (equivalence
     relation being "points to a fixed semantic object"). The bitwise
     representation fluctuates constantly outside the compiler's
     control (the dual of `undef`), but may have invariants (in
     particular, we'd like to be able to specify alignment, nonnull
     etc.). At any given point in time all GCREF instances pointing to
     the same object have the same bitwise representation (we need this
     to make `icmp eq` is well-defined).

  2. GCREF instances can only contain a valid gc reference (otherwise
     they can't meaningfully "fluctuate" among the various possible
     bitwise representations of a reference).

  3. Converting GCREF to integers is fine in general, but you'll get an
     arbitrary "snapshot" of the bitwise value that will generally not
     be meaningful (unless you are colluding with the GC in
     implementation defined ways).

  4. Converting integers to GCREF is allowed only if source integer is
     a bitwise representation of a valid GC reference that is not an
     out of bounds derived reference. However, this is difficult for
     the compiler to infer since it typically will have no fundamental
     knowledge of what bitwise representation can be a valid GC
     reference.

  5. Operations that use a GCREF-typed value are "atomic" in using the
     bitwise representation, i.e., loading from a GCREF typed value
     does not "internally" convert the GCREF to a normal
     integer-pointer and then use the integer-pointer, since that would
     mean there is a window in which the integer-pointer can become
     stale[1].

  6. A GCREF stored to a location in the heap continues to fluctuate,
     and keeps itself in sync with the right bitwise representation.
     In a way, there isn't a large distinction between the GC and the
     heap -- the heap is part of (or managed by) the GC.

I think (6) is the most controversial of the semantics above, but it
isn't very different from how `undef` stored to the heap remains
`undef` (i.e. a non-deterministic N-bit value) and a later load can
recover `undef` instead of getting a normal N-bit value.

I'm not really convinced that the GCREF type is really necessary...
consider an alternate model:

1. A GCREF is never loaded into a register; it's either on the heap, or in
an alloca.
2. Add an intrinsic gcref.copy which copies a gcref between two allocas.
3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and
gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a
gcref through a gcref.
4. Add intrinsics gcref.load_value(GCREF*, offset) and
gcref.store_value(GCREF*, offset, value) which load and store normal values
a gcref.
5. The statepoint lowering pass gets rid of the allocas.

Keeping GCREFs exclusively in memory means the LLVM optimizer will handle
them conservatively, but correctly.

I guess the problem with this is precisely that the LLVM optimizer will
handle them conservatively... but on the flip side, I think you're going to
end up chasing down weird problems forever if a "load" from an alloca has
side-effects.

-Eli

[+CC Jingyue Wu, John Criswell]

Updating with some informal discussion on IRC.

Chandler Carruth wrote:
> My high-level comment is that I'm not really sure we need a new type
> here. I'm curious whether we can make non-zero-address-space pointers
> have the semantics you need in a conservative model.

That's close to what I suggested in the 'Proposed Solution' section,
but I didn't explicitly mention anything about being conservative for
unknown address spaces, and being precise/more aggressive for
explicitly enumerated address spaces. That's a reasonable stance, but
I'd like to hear from other users of exotic address spaces (CC'ed).

[snip]

> I think this is the interesting thing. It is essentially saying that
> loads of a non-zero address space pointer are control dependent which I
> don't think is necessarily true with our current definitions of address
> spaces, but I think this might be a very reasonable desire.
>
> If anything, we might want something in the datalayout that identifies
> whether particular address spaces can be speculatively loaded from. This

This is an interesting point worth stressing -- the new thing here is
that we're making the semantics of stores and loads a function of the
pointee *type* i.e. the address space of the pointer we are loading or
storing, not just the address space of the pointer type we're loading
*from* or storing *to*. This is new because so far (at least
informally) they've been a function of just the pointee *size*, and
the address space of the location being stored to / loaded from.

[snip]

> So I'm curious whether this seems reasonable to you (and others). To
> summarize: make address spaces sufficiently conservative by default to
> satisfy the requirements for GC pointers, and add "opt-in" mechanisms
> for generic transforms that existing users of non-zero address spaces
> actually desire.

This sounds good.

> Alternatively, I'm interested in any examples that more firmly
> illustrate the reasons why address spaces are fundamentally the wrong
> *model* for GC pointers.

As discussed on IRC, that's what I had in mind initially.

-- Sanjoy

I think everything but this last weird aspect we already get from address spaces.

I misread the proposal originally and didn’t understand that the problem was loading from an alloca holding the GC pointer, and thus it was a normal and boring load that somehow has to have side-effects.

I fundamentally think that we can’t do that. I can see several ways to make the result work without that.

  • Teach the statepoint rewriting to handle hoisted loads in some way (haven’t thought too much about how feasible this is)
  • Tell LLVM that the load has this weird control dependence with some mechanism (make it a special gc load intrinsic, or a volatile load, or …)

-Chandler

Hi Eli,

Eli Friedman wrote:
>
> I'm not really convinced that the GCREF type is really necessary...
> consider an alternate model:
>
> 1. A GCREF is never loaded into a register; it's either on the heap, or
> in an alloca.
> 2. Add an intrinsic gcref.copy which copies a gcref between two allocas.
> 3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and
> gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a
> gcref through a gcref.
> 4. Add intrinsics gcref.load_value(GCREF*, offset) and
> gcref.store_value(GCREF*, offset, value) which load and store normal
> values a gcref.
> 5. The statepoint lowering pass gets rid of the allocas.

This is similar to what we used to call "early safepoint insertion",
but ...

>
> Keeping GCREFs exclusively in memory means the LLVM optimizer will
> handle them conservatively, but correctly.
>
> I guess the problem with this is precisely that the LLVM optimizer will
> handle them conservatively...

... yup. We _want_ LLVM to optimize well.

> but on the flip side, I think you're going
> to end up chasing down weird problems forever if a "load" from an alloca
> has side-effects.

I need to carefully think through this before committing to it, but I
think we're generally okay with completely disallowing allocas of
GCREF type (like we do for token types today). If we could do that
is your concern addressed?

-- Sanjoy

Hi Chandler,

Chandler Carruth wrote:
> I think everything but this last weird aspect we already get from
> address spaces.
>
> I misread the proposal originally and didn't understand that the problem
> was loading from an alloca *holding* the GC pointer, and thus it was a
> normal and boring load that somehow has to have side-effects.
>
> I fundamentally think that we can't do that. I can see several ways to
> make the result work without that.
>
> - Teach the statepoint rewriting to handle hoisted loads in some way
> (haven't thought too much about how feasible this is)

I think that is impossible in general, given LLVM's IR. If we start
with:

// IR_1
for (;:wink: {
   if (condition) {
     use(this->field); // static type of(&(this->field)) == GCREF*
   }
}

and LICM it to:

// IR_2
GCREF val = this->field
for (;:wink: {
   if (condition) {
     use(val);
   }
}

then when rewriting it is incorrect to go from IR_2 to IR_1, assuming
that IR_2 is all we see, since in IR_2 use sees a single unique value
for every iteration whereas in IR_1 it may see different values in
different iterations.

> - Tell LLVM that the load has this weird control dependence with some
> mechanism (make it a special gc load intrinsic, or a volatile load, or ....)

A gc_load intrinsic is not a reasonable idea actually, but I think it
will be worse in the long run.

To start off, we'd need to at least add gc_load to gc_load and store
to gc_load forwarding (the former for control equivalent gc_loads).
We have a set of analyses locally (downstream) that can infer Java
types, which can be used to justify hoisting gc_loads as well (by
proving that the gc_load will result in a GC reference in the new
location) that we'd like to eventually upstream. Once all the dust
settles, I think we'll end up with too many cases of:

   if (auto *LI = dyn_cast<LoadInst>(V)) { process LI }
   if (auto *II = dyn_cast<IntrinsicInst>(V))
     if (II->getIntrinsicID() == gc_load) {
       prrocess II; // almost duplicate of process LI
     }

whereas a GCREF type (however it is represented, either as <ty>
addrspace(k)* or a new type) will let us fold most of the interesting
logic into the safety checks we already do for load instructions.

Slightly unrelated to the above, but here is another example (apart
from the LICM example above) of why doing "repair" in the rewrite pass
is difficult, after letting the optimizer run its course:

If we have:

   if (always_false) {
     GCREF val = this->field;
     use(val);
   }
   safepoint();
   if (always_false) {
     GCREF val = this->field;
     use(val);
   }
   clobber_this_field();

and we hoist and CSE this to:

   GCREF val = this->field;
   if (always_false) {
     use(val);
   }
   safepoint();
   clobber_this_field();
   if (always_false) {
     use(val);
   }

then we have "val" live over the safepoint(), but there is no
guarantee that "val" actually contains a GC reference. Moreover, we
can't sink the load into the two blocks that use it, since we've
clobber_this_field() in a way that precludes that.

-- Sanjoy

Hi Eli,

Eli Friedman wrote:
>
> I'm not really convinced that the GCREF type is really necessary...
> consider an alternate model:
>
> 1. A GCREF is never loaded into a register; it's either on the heap, or
> in an alloca.
> 2. Add an intrinsic gcref.copy which copies a gcref between two allocas.
> 3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and
> gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a
> gcref through a gcref.
> 4. Add intrinsics gcref.load_value(GCREF*, offset) and
> gcref.store_value(GCREF*, offset, value) which load and store normal
> values a gcref.
> 5. The statepoint lowering pass gets rid of the allocas.

This is similar to what we used to call "early safepoint insertion",
but ...

>
> Keeping GCREFs exclusively in memory means the LLVM optimizer will
> handle them conservatively, but correctly.
>
> I guess the problem with this is precisely that the LLVM optimizer will
> handle them conservatively...

... yup. We _want_ LLVM to optimize well.

What optimizations are you missing? The very fact that it isn't in SSA
form hurts to some extent... and I guess you want to leverage existing
optimizations for loads and stores?

> but on the flip side, I think you're going
> to end up chasing down weird problems forever if a "load" from an alloca
> has side-effects.

I need to carefully think through this before committing to it, but I
think we're generally okay with completely disallowing allocas of
GCREF type (like we do for token types today). If we could do that
is your concern addressed?

In general, if I think of a GCREF as an unsized handle to an object rather
than a pointer with weird properties, it's somewhat less scary. Also, it's
more clear how to deal with a GCREF if you explicitly list out operations
which are legal rather than which operations are illegal. For example "A
GCREF is a new type of value. It does not have a representation in
memory. You can use GEP on a GCREF. You can load or store through a
GCREF; this resolves the GCREF to a specific address in memory, then acts
like a load/store from that address. GCREFs have these aliasing rules:
[...]. There are GCREF intrinsics for constructing and manipulating
GCREFs. No other operations on GCREFs are allowed."

I guess the bit I'm most worried about is using plain loads to create
GCREFs, and plain stores to store them. As long as we don't have that,
existing optimization passes will essentially just work, I think; we won't
try to duplicate or hoist loads in situations where that isn't allowed, and
we won't try to perform weird value manipulations like trying to construct
one using an inttoptr. (I guess there are some passes that assume the
operand of a load is specifically a PointerType, but that's not really a
semantic difference.)

-Eli

Sanjoy,

This looks very close to my understanding of the statepoint design trajectory when you first introduced it. It’s great that you followed through and took the time to formalize the IR semantics. It’s been a couple years since I’ve thought about it so I may ask some obtuse questions.

I think he subject line is wrong though! Did you actually say what a “strong GC reference is”? I think you mean "precise GC reference", since weak references have the same requirements for being tracked at statepoints.

Getting LLVM ToT working with a precise relocating GC involves taking
the following steps:

   Note: GCREF is the type representing a pointer that the GC needs
     to know about and potentially relocate. Currently it is `<Ty>
     addrspace(1)*`.

1. Generate IR, and represent GC pointer SSA values as values of type
   GCREF.
2. Run opt -O3.
3. Run RewriteStatepointForGC. It rewrites IR values of type GCREF
   to "normal" pointers that the backend can lower (note -- we don't
   physically rewrite the llvm::Types to keep things simple and fast). It
   does this by making all of the the data flow required to express
   relocation semantics explicit in the IR.
4. Rewrite loads and stores of GCREF type to execute load and store
   barriers (logically can be part of (3)).
5. Lower into machine code.

For this discussion let's treat the internals of (3), (4) and (5) as
black boxes, and center our discussion on what properties we require
of the IR after step (2) so that step (3), (4) and (5) can do their
job. However, if you think the overall design can be simplified by
changing (3), (4) or (5), please feel free to point that out. :slight_smile:

Great.

After (2) and before (3), we want to have the following invariant:

'''
Given any location "L" in the CFG of a function, let the set of all
values of type GCREF whose defs dominate "L" be "G":

A. "G" is a superset of the references that need to be reported to
    the GC if we put a safepoint at "L". References need to be
    reported to the GC if the GC may want to scan through them or
    rewrite them to point to the new locations of the objects they
    used to point to.

B. Everything in "G" is a valid GC reference. This directly implies
    that all values of type GCREF hold a valid GC reference (since
    "L" could be the location immediately following the def).
‘''

Makes sense. The frontend may want to control the lifetime of references, but that can be done with some CFG constructs.

Few important things to note about "valid GC reference":

1. There is no fixed notion of a valid GC reference. What
   constitutes a valid GC reference can change at arbitrary points in
   time.

2. There may not be a runtime bitwise predicate that can reliably
   distinguish a "valid" GC reference from an invalid one. The
   validity of a GC reference is a function of its provenance as well
   as its bitwise content.

I’m not sure what’s being said here, but a GC reference is valid when you load it and remains valid as long as the compiler feels like keeping it valid. That decision will be made in step #3, statepoint rewriting.

3. (Related to the previous point) Arbitrary GEPs off of "valid GC
   references" are also "valid GC references". This includes out of
   bounds GEPs (whose bitwise representation can have any value, for
   all practical intents and purposes).

Fair enough, and important to point out.

[Aside: We could phrase the invariant in terms of fixed safepoint
locations, but I think focusing on this (stronger) invariant actually
makes the problem simpler (and no harder).]

The invariant is somewhat vague and relies on white-box knowledge of
the GC to define what values "need to be reported to the GC" (this
will become clearer as we see some examples); and the "problem
statement" is to come up with a precise set of rules governing the
GCREF type so that the invariants we need fall out. Today we enforce
the invariants downstream by `#if 0` ing out code that breaks it.
However that's fragile; and more importantly, it does not help other
people interested in precise GC and LLVM.

Good motivation for getting this proposal in langref,

# Examples of Bad Transforms

Let's look at some some examples of bad transforms that are legal
today (in upstream LLVM) with GCREF == "addrspace(1)*". Right now
we'll justify their "badness" by invoking ad-hoc GC specific rules on
a case by case basis:

I know non-zero address space has “target semantics", but I have no idea what that means to the optimizer today. Is there some class of IR level optimization that is prohibited at address space > 0?

## Integer -> GCREF Conversion

(%value is unused)

%value = load i64, i64* %ptr
== Transformed to ==>
%value = load GCREF, GCREF* (bitcast %ptr)

This is a problem since we've introduced a value of type GCREF that
may not be a valid GC reference at runtime. If the compiler can prove
that %value is actually a valid GC reference then this transform is
valid, but typically it won't be able to except in cases like "%value
is provably 0" in which case it is better to just fold the load away.

%value = load i64, i64* %ptr
== Transformed to ==>
%value = load i64, i64* %ptr
%value_gc = inttoptr %value to GCREF (%value_gc unused)

is invalid for the same reason.

Ok. I know some passes may think it’s fine to “pass values through memory" and bitcast the source and dest. Are there specific passes in-tree that you’ve seen do this? Are they just passing values through alloca? We shouldn’t need to worry about general heap access doing this. Is there any other reason a pass would load an address as i64?

## GCREF -> Integer conversion

Converting a GCREF to an integer is fine if the integer is unused.
However, cases like these are problematic:

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_1
%cmp = icmp eq %v0, %v1
== Transformed to ==>
%v0 = load i64, i64* %ptr_0
%v1 = load i64, i64* %ptr_1
%cmp = icmp eq %v0, %v1

Since in the second version if "L" is, say, between %v0 and %v1, "G"
would not include "v0" which contains a GC reference that the GC may
want to scan or relocate[0].

For similar reasons

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_0
%cmp = icmp eq %v0, %v1
== Transformed to ==>
%v0 = load GCREF, GCREF* %ptr_0
%v0.i = ptrtoint %v0
%v1 = load GCREF, GCREF* %ptr_0
%v1.i = ptrtoint %v1
%cmp = icmp eq %v0.i, %v1.i

is also invalid.

Converting any GCREF to an integer is BAD, period. A single benign use could float below a safepoint. Imagine if you expanded card marks early before step #3!

Again, is this real or hypothetical? Who is doing this and why?

## Round tripping GCREF's through Integers

The following transform is invalid:

%v0 = load GCREF, GCREF* %loc0
store GCREF %v0, GCREF* %loc1
== Transformed to ==>
%v0 = load i64, i64* (bitcast %loc0)
store i64 %v0, i64* (bitcast %loc1)

because if "L" in the second version is between the load and the
store, "G" will not include `%v0` even though we need to report it to
the GC (otherwise we could propagate a stale bitwise representation of
the GC reference we loaded in `%v0`).

Just like the last two cases, converting a GCREF to an integer is bad.

Real or hypothetical?

## Bad types due to hoisting loads out of control flow

This is bad

if (is_reference) {
%val = load GCREF, GCREF* %ptr
}
== Transformed to ==>
%val = load GCREF, GCREF* %ptr
if (is_reference) {
}

unless the compiler can prove that %val is a GC reference at its new
location. Downstream we model the Java type system in LLVM IR to try
to prove that %val is a GC reference in its new location, but for
starters we will have to be conservative upstream.

This is really abstract. Why would the load be hoisted if %ptr is not dereferenceable? Why would it be dereferenceable if GCRef is not valid?

I think this has to do with type checks. You could have a valid object, but the GCRef field may only be valid under some type check.

It is definitely important to speculate these loads, but I agree that it needs to be done with special logic that understands GCRef.
In particular, you can speculate one such load, but never a dependent load. You could solve the dominance problem by introducing some garbage value on the ‘else’ path. But then your simple invariants don’t hold. This is optimization is probably best done contained in step #3, rewriting.

Alternatively, you somehow make the dereferenceable attribute apply to individual fields. Hasn't some sort of “assume dereferenceable” intrinsic been proposed? You would still left with the problem that you want to speculate past one level of type check.

# Proposed Solution:

We introduce a "new" LLVM type. I will still refer to it as GCREF
here, but it may actually still be "<ty> addrspace(k)*" where k is
specially noted in the datalayout.

Sounds good to make this target-configurable.

Semantics:

1. GCREF represents an equivalence class of values (equivalence
   relation being "points to a fixed semantic object"). The bitwise
   representation fluctuates constantly outside the compiler's
   control (the dual of `undef`), but may have invariants (in
   particular, we'd like to be able to specify alignment, nonnull
   etc.). At any given point in time all GCREF instances pointing to
   the same object have the same bitwise representation (we need this
   to make `icmp eq` is well-defined).

OK.

2. GCREF instances can only contain a valid gc reference (otherwise
   they can't meaningfully "fluctuate" among the various possible
   bitwise representations of a reference).

Lost me.

3. Converting GCREF to integers is fine in general, but you'll get an
   arbitrary "snapshot" of the bitwise value that will generally not
   be meaningful (unless you are colluding with the GC in
   implementation defined ways).

I see what you’re getting at but don’t like the wording. Converting GCRef to integers is not “fine”. It’s not even meaningful. It just isn’t undefined behavior at the point of conversion.

4. Converting integers to GCREF is allowed only if source integer is
   a bitwise representation of a valid GC reference that is not an
   out of bounds derived reference. However, this is difficult for
   the compiler to infer since it typically will have no fundamental
   knowledge of what bitwise representation can be a valid GC
   reference.

I don’t know about this. It seems like the integer’s lifetime could be extended across a safepoint.

5. Operations that use a GCREF-typed value are "atomic" in using the
   bitwise representation, i.e., loading from a GCREF typed value
   does not "internally" convert the GCREF to a normal
   integer-pointer and then use the integer-pointer, since that would
   mean there is a window in which the integer-pointer can become
   stale[1].

Yes, of course. It’s a good general rule for the optimizer. I wish any pass that violates atomic reads/writes of pointers had a good reason and some general way to disable it.

6. A GCREF stored to a location in the heap continues to fluctuate,
   and keeps itself in sync with the right bitwise representation.
   In a way, there isn't a large distinction between the GC and the
   heap -- the heap is part of (or managed by) the GC.

I think (6) is the most controversial of the semantics above, but it
isn't very different from how `undef` stored to the heap remains
`undef` (i.e. a non-deterministic N-bit value) and a later load can
recover `undef` instead of getting a normal N-bit value.

I think I know what you mean.

## How do we fare with the examples?

Let's take a look some of the examples, and see how the semantics
specified above help us:

Is case #0, Integer -> GCREF Conversion, missing or covered below?

### Demoting loads / stores from GCREF to integers is invalid (case 1):

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_1
%cmp = icmp eq %v0, %v1
==>
%v0 = load i64, i64* %ptr_0
%v1 = load i64, i64* %ptr_1
%cmp = icmp eq %v0, %v1

Invalid because the initial IR is comparing two GC fluctuating
references, while the the transformed IR will compare their bitwise
snapshot taken at an arbitrary point in time.

This seems unnecessary to reason about since any use of a GCRef’s bits is not meaningful at any point in the program later than its immediate use.

### Demoting loads / stores from GCREF to integers is invalid (case 2):

%v0 = load GCREF, GCREF* %loc0
store GCREF %v0, GCREF* %loc1
==>
%v0 = load i64, i64* (bitcast %loc0)
store i64 %v0, i64* (bitcast %loc1)

Invalid because there are two semantic differences between the initial
and final IR:

1. In the initial IR we store a fluctuating and up-to-date GCREF,
   whereas in the final IR we store a potentially stale bitwise
   snapshot of the value.
2. In the initial IR, after the store *%loc1 has a fluctuating GC
   reference, while in the final IR *%loc1 only has a once-valid
   snapshot of a GC reference (that will no longer fluctuate).

This is interesting to discuss, but still just falls under the GCRef cannot be converted to and used as an integer rule.

### Basic store forwarding is valid:

store GCREF %v0, GCREF* %loc0
%v1 = load GCREF, GCREF* %loc0
==>
store GCREF %v0, GCREF* %loc0
%v1 = %v0

The store forwarding is valid since we stored a GCREF that
semantically points to object O (say), and loaded back a GCREF
pointing to the same object O.

Good.

### Hoisting inttoptr is (generally) invalid:

if (<condition>) {
   %val_gc = ptrtoint %val to GCREF
}
==>
%val_gc = ptrtoint %val to GCREF
if (<condition>) {}

is invalid since we do not know that `%val` is a valid bitwise
representation of a GCREF at the point in time we do the `ptrtoint`
(i.e. <<condition>> could be controlling whether `%val` is a valid
bitwise representation of a GCREF).

In theory this isn't different from loads, divisions or any other
potentially trapping operation, but I think a lot of LLVM directly
encodes the fact that `CastInst` s are obviously speculatable, so
maybe Integer -> GCREF casts should be outlawed at the type system
level, and be done via an intrinsic?

Is this supposed to be 'inttoptr’? I don’t see how this can work at all, since the integer’s lifetime can extend beyond a safepoint. Do you really need to support constant GCRefs? Cliff Click said it was a bad decision.

### Store forwarding between integers and GCREFs is valid

store i64 %val, i64* %loc
%val = load GCREF, GCREF* (bitcast %loc)
==>
store i64 %val, i64* %loc
%val = inttoptr %val to GCREF

is valid.

Both the initial and the final program assume that %val is a valid
bitwise representation of a GC reference.

(Note: it may not be valid to speculate the inttoptr instruction above
control flow).

Why do you need this? As in the previous case, it doesn’t seem like a sound representation.

### Load forwarding between integers and GCREFs is invalid

%val_i = load i64, i64* %loc
%val_p = load GCREF, GCREF* (bitcast %loc)
==>
%val_i = load i64, i64* %loc
%val_p = inttoptr %val_i to GCREF

is invalid if *%loc contains a GC reference. Given the model that we
have, the first program loads some bitwise representation of
fluctuating a GC reference, and then loads the same GC reference as a
GCREF; whereas the second program tries to convert a potentially stale
bitwise representation of a GC reference into a fluctuating GCREF
(which is not allowed).

I think the original code is unspecified, maybe undefined at the point of %val_i’s use.

## Implementation Issues & Impact to LLVM

This is a proposed path forward to implementing GCREF in LLVM. The
caveat mentioned at the start of the email (everything is open for
discussion) especially applies here.

### Changes to IR to represent GCREF

We change `datalayout` to denote pointers of a specific address space
as GCREFs. We add verifier rules forbidding `inttoptr` conversion of
integers into GCREFs and add an intrinsic (that can be control
dependent) that converts integers to GCREFs.

Sounds good to me. If you really need Int->GCRef conversion, then it seems like it needs to be an intrinsic with side effects, contrary to many of your examples. Or are you thinking that this intrinsic is only generated after lowering GCRef stores in step #4?

What about store barriers? Do they need an intrinsic or just `ptrtoint %gcref to i64`?

This is probably covered somewhere else, but what are the semantics of your statepoints with regard to reading/writing memory?

### Changes to the optimizer

We change the optimizer to

- Not speculate loads of GCREF type
- Not change uses of GCREF types to uses of non-GCREF types

Locally we already have changes fixing most of the places that'd need
to be fixed for the above to hold.

You may be answering this elsewhere in the thread, but are we sure the GC load shouldn’t be an intrinsic?

### Lowering

We do not teach LLC about GCREF's at all (in fact, we can change it to
assert that the datalayout does not contain a `gc` directive). We
make the RewriteStatepointForGC pass rewrite the IR to make
relocations explicit (so that no more GCREF-like semantics are
necessary) and remove the GC directive from the Module's datalayout.

Ok. Is datalayout normally supposed to be used to convey IR lowering semantics? Or should it be immutable per target?

During statepoint lowering, I guess the IR does not change in any locally recognizable way. Pointer types are all still addresspace(1) right? Changing the datalayout to change the meaning of those types makes sense to me if it’s ok with everyone else.

Thoughts?

-- Sanjoy

Footnotes:

[0]:
To see why this is bad assume to begin with %ptr_0 and %ptr_1 both
contain a GC reference pointing to object X, currently at address
0x500.

Now the following happens

%v0 = load i64, i64* %ptr_0
%v0 is 0x500
<< safepoint >> GC moves X to 0x600, and update *%ptr_1
%v1 = load i64, i64* %ptr_1
%v1 = 0x600
%cmp = icmp eq %v0, %v1
%cmp = false, when it should be true

If %v0 was "reported to the GC" in <<safepoint>> then it would have
been updated in place to 0x600, and %cmp would evaluate to true, as
expected.

[1]:
In reality, this is handled by _not_ putting safepoints in problematic
locations. For instance, if lowering a CompareAndSwap in some weird
architecture would involve doing:

// CAS(From, To, Location)
Location ^= << Constant >>

// The GC cannot parse the state at this location "L", since we've
// done some arbitrary arithmetic on Location.

... Do some magic with Location, To, From

then we _cannot_ put a safepoint at location "L”.

I honestly don’t know what this footnote means at the LLVM IR level. I don’t think GCRef’s should exist as integer-types values prior to statepoint lowering, or you would have to convince me of why it’s necessary and safe.

-Andy

Hi Eli,

Eli Friedman wrote:
> What optimizations are you missing? The very fact that it isn't in SSA
> form hurts to some extent... and I guess you want to leverage existing
> optimizations for loads and stores?

Yes, we'd have to "reimplement" basic things like

   gcref_store(location, offset, value)
   gcref_copy(location_2, location)
   value_2 = gcref_store(location_2, offset)

=>

   gcref_store(location, offset, value)
   gcref_copy(location_2, location)
   value_2 = value

etc.

> > but on the flip side, I think you're going
> > to end up chasing down weird problems forever if a "load" from an
> alloca
> > has side-effects.
>
> I need to carefully think through this before committing to it, but I
> think we're generally okay with completely disallowing allocas of
> GCREF type (like we do for token types today). If we could do that
> is your concern addressed?
>
> In general, if I think of a GCREF as an unsized handle to an object
> rather than a pointer with weird properties, it's somewhat less scary.

Sure, but we also want to give it some pointer-like attributes, like
alignment, dereferenceable_or_null etc.

> Also, it's more clear how to deal with a GCREF if you explicitly list
> out operations which are legal rather than which operations are
> illegal.

Agreed.

> For example "A GCREF is a new type of value. It does not have
> a representation in memory. You can use GEP on a GCREF. You can load
> or store through a GCREF; this resolves the GCREF to a specific address
> in memory, then acts like a load/store from that address. GCREFs have
> these aliasing rules: [...]. There are GCREF intrinsics for
> constructing and manipulating GCREFs. No other operations on GCREFs are
> allowed."
>
> I guess the bit I'm most worried about is using plain loads to create
> GCREFs, and plain stores to store them. As long as we don't have that,
> existing optimization passes will essentially just work, I think; we

Using a gc_load and gc_store intrinsic would do the job for
correctness, but as I've said before I think we'll end up with some
incidental complexity since almost every place that handles loads or
stores will also have to handle these intrinsics. However, I
personally don't mind doing the work too much if the community is okay
with the (hopefully mostly mechanical) code duplication.

-- Sanjoy

Hi Andy,

Andrew Trick wrote:
> Sanjoy,
>
> This looks very close to my understanding of the statepoint design trajectory when you first introduced it. It’s great that you followed through and took the time to formalize the IR semantics. It’s been a couple years since I’ve thought about it so I may ask some obtuse questions.
>
> I think he subject line is wrong though! Did you actually say what a “strong GC reference is”? I think you mean "precise GC reference", since weak references have the same requirements for being tracked at statepoints.

I meant "strong" as opposed to the "weaker" pointer type that LLVM
currently has (which can be freely converted to and from integers).
But "precise GC reference" is better, so I'll steal that. :slight_smile:

>>
>> Getting LLVM ToT working with a precise relocating GC involves taking
>> the following steps:
>>
>> Note: GCREF is the type representing a pointer that the GC needs
>> to know about and potentially relocate. Currently it is `<Ty>
>> addrspace(1)*`.
>>
>> 1. Generate IR, and represent GC pointer SSA values as values of type
>> GCREF.
>> 2. Run opt -O3.
>> 3. Run RewriteStatepointForGC. It rewrites IR values of type GCREF
>> to "normal" pointers that the backend can lower (note -- we don't
>> physically rewrite the llvm::Types to keep things simple and fast). It
>> does this by making all of the the data flow required to express
>> relocation semantics explicit in the IR.
>> 4. Rewrite loads and stores of GCREF type to execute load and store
>> barriers (logically can be part of (3)).
>> 5. Lower into machine code.
>>
>> For this discussion let's treat the internals of (3), (4) and (5) as
>> black boxes, and center our discussion on what properties we require
>> of the IR after step (2) so that step (3), (4) and (5) can do their
>> job. However, if you think the overall design can be simplified by
>> changing (3), (4) or (5), please feel free to point that out. :slight_smile:
>
> Great.
>
>> After (2) and before (3), we want to have the following invariant:
>>
>> '''
>> Given any location "L" in the CFG of a function, let the set of all
>> values of type GCREF whose defs dominate "L" be "G":
>>
>> A. "G" is a superset of the references that need to be reported to
>> the GC if we put a safepoint at "L". References need to be
>> reported to the GC if the GC may want to scan through them or
>> rewrite them to point to the new locations of the objects they
>> used to point to.
>>
>> B. Everything in "G" is a valid GC reference. This directly implies
>> that all values of type GCREF hold a valid GC reference (since
>> "L" could be the location immediately following the def).
>> ‘''
>
> Makes sense. The frontend may want to control the lifetime of references, but that can be done with some CFG constructs.
>
>> Few important things to note about "valid GC reference":
>>
>> 1. There is no fixed notion of a valid GC reference. What
>> constitutes a valid GC reference can change at arbitrary points in
>> time.
>>
>> 2. There may not be a runtime bitwise predicate that can reliably
>> distinguish a "valid" GC reference from an invalid one. The
>> validity of a GC reference is a function of its provenance as well
>> as its bitwise content.
>
> I’m not sure what’s being said here, but a GC reference is valid when you load it and remains valid as long as the compiler feels like keeping it valid. That decision will be made in step #3, statepoint rewriting.

I meant there is no way to look at solely the bitwise representation
of a value and decide whether it is a GC reference or not, since it
could be a integer that just happens to look like a GC reference
bitwise.

>> 3. (Related to the previous point) Arbitrary GEPs off of "valid GC
>> references" are also "valid GC references". This includes out of
>> bounds GEPs (whose bitwise representation can have any value, for
>> all practical intents and purposes).
>
> Fair enough, and important to point out.
>
>> [Aside: We could phrase the invariant in terms of fixed safepoint
>> locations, but I think focusing on this (stronger) invariant actually
>> makes the problem simpler (and no harder).]
>>
>> The invariant is somewhat vague and relies on white-box knowledge of
>> the GC to define what values "need to be reported to the GC" (this
>> will become clearer as we see some examples); and the "problem
>> statement" is to come up with a precise set of rules governing the
>> GCREF type so that the invariants we need fall out. Today we enforce
>> the invariants downstream by `#if 0` ing out code that breaks it.
>> However that's fragile; and more importantly, it does not help other
>> people interested in precise GC and LLVM.
>
> Good motivation for getting this proposal in langref,
>
>> # Examples of Bad Transforms
>>
>> Let's look at some some examples of bad transforms that are legal
>> today (in upstream LLVM) with GCREF == "addrspace(1)*". Right now
>> we'll justify their "badness" by invoking ad-hoc GC specific rules on
>> a case by case basis:
>
> I know non-zero address space has “target semantics", but I have no idea what that means to the optimizer today. Is there some class of IR level optimization that is prohibited at address space> 0?

Not that I'm aware of.

>
>> ## Integer -> GCREF Conversion
>>
>> (%value is unused)
>>
>> %value = load i64, i64* %ptr
>> == Transformed to ==>
>> %value = load GCREF, GCREF* (bitcast %ptr)
>>
>> This is a problem since we've introduced a value of type GCREF that
>> may not be a valid GC reference at runtime. If the compiler can prove
>> that %value is actually a valid GC reference then this transform is
>> valid, but typically it won't be able to except in cases like "%value
>> is provably 0" in which case it is better to just fold the load away.
>>
>> %value = load i64, i64* %ptr
>> == Transformed to ==>
>> %value = load i64, i64* %ptr
>> %value_gc = inttoptr %value to GCREF (%value_gc unused)
>>
>> is invalid for the same reason.
>
> Ok. I know some passes may think it’s fine to “pass values through memory" and bitcast the source and dest. Are there specific passes in-tree that you’ve seen do this? Are they just passing values through alloca? We shouldn’t need to worry about general heap access doing this. Is there any other reason a pass would load an address as i64?

InstCombine today will transform

   %v = load i32*, i32** %ptr0
   store i32* %v, i32** %ptr1

into

   %1 = bitcast i32** %ptr0 to i64*
   %v1 = load i64, i64* %1, align 8
   %2 = bitcast i32** %ptr1 to i64*
   store i64 %v1, i64* %2, align 8

>
>> ## GCREF -> Integer conversion
>>
>> Converting a GCREF to an integer is fine if the integer is unused.
>> However, cases like these are problematic:
>>
>> %v0 = load GCREF, GCREF* %ptr_0
>> %v1 = load GCREF, GCREF* %ptr_1
>> %cmp = icmp eq %v0, %v1
>> == Transformed to ==>
>> %v0 = load i64, i64* %ptr_0
>> %v1 = load i64, i64* %ptr_1
>> %cmp = icmp eq %v0, %v1
>>
>> Since in the second version if "L" is, say, between %v0 and %v1, "G"
>> would not include "v0" which contains a GC reference that the GC may
>> want to scan or relocate[0].
>>
>> For similar reasons
>>
>> %v0 = load GCREF, GCREF* %ptr_0
>> %v1 = load GCREF, GCREF* %ptr_0
>> %cmp = icmp eq %v0, %v1
>> == Transformed to ==>
>> %v0 = load GCREF, GCREF* %ptr_0
>> %v0.i = ptrtoint %v0
>> %v1 = load GCREF, GCREF* %ptr_0
>> %v1.i = ptrtoint %v1
>> %cmp = icmp eq %v0.i, %v1.i
>>
>> is also invalid.
>
> Converting any GCREF to an integer is BAD, period. A single benign use could float below a safepoint. Imagine if you expanded card marks early before step #3!

Here I'm trying to constrain the optimizer -- if the frontend wants to
insert ptrtoints and knows what it is doing (because it has GC
specific knowledge), then that should be fine.

> Again, is this real or hypothetical? Who is doing this and why?

I haven't seen this happen organically (so this is likely
hypothetical), but I want to prevent it from happening even if there
is a reason to do it.

>> ## Round tripping GCREF's through Integers
>>
>> The following transform is invalid:
>>
>> %v0 = load GCREF, GCREF* %loc0
>> store GCREF %v0, GCREF* %loc1
>> == Transformed to ==>
>> %v0 = load i64, i64* (bitcast %loc0)
>> store i64 %v0, i64* (bitcast %loc1)
>>
>> because if "L" in the second version is between the load and the
>> store, "G" will not include `%v0` even though we need to report it to
>> the GC (otherwise we could propagate a stale bitwise representation of
>> the GC reference we loaded in `%v0`).
>
> Just like the last two cases, converting a GCREF to an integer is bad.
>
> Real or hypothetical?

This is in instcombine.

>> ## Bad types due to hoisting loads out of control flow
>>
>> This is bad
>>
>> if (is_reference) {
>> %val = load GCREF, GCREF* %ptr
>> }
>> == Transformed to ==>
>> %val = load GCREF, GCREF* %ptr
>> if (is_reference) {
>> }
>>
>> unless the compiler can prove that %val is a GC reference at its new
>> location. Downstream we model the Java type system in LLVM IR to try
>> to prove that %val is a GC reference in its new location, but for
>> starters we will have to be conservative upstream.
>
> This is really abstract. Why would the load be hoisted if %ptr is not dereferenceable? Why would it be dereferenceable if GCRef is not valid?
>
> I think this has to do with type checks. You could have a valid object, but the GCRef field may only be valid under some type check.

Yes, in practice the "is_reference" check will be a type check. You
can end up with this from Java code that looks like:

class A {
   long longField; // Offset 16, say
}

class B {
   Object objField; // Offset 16 also
}

void f(A a) {
   Object o = a;

   // Since a is of type A, we know that it is dereferenceable up to
   // 16+8 == 24 bytes, but it does not mean that we can speculate
   // the load of objField.

   // This is kind of silly as written, but this does happen after
   // inlining.

   if (o instanceof B) {
     // This is dead code, and we will often DCE it away, but we're
     // not allowed to rely on that happening (before hoisting) for
     // correctness.
     Object obj = ((B) o).objField;
     print(obj.hashCode());
   }
}

> It is definitely important to speculate these loads, but I agree that it needs to be done with special logic that understands GCRef.

> In particular, you can speculate one such load, but never a dependent load.

We can't typically speculate dependent loads, but that's because we
can't (typically) prove dependent loads as loading from non-null
locations. E.g. in

class A {
   Object o;
}

class B {
   A a;
   void g() {
     if (cond) {
       this.a.o.hashCode();
     }
   }
}

We could speculate the load of .o if this.a could be proved to be
non-null.

> You could solve the dominance problem by introducing some garbage value on the ‘else’ path. But then your simple invariants don’t hold.

Not sure what you mean by "simple invariants don’t hold", but in the
example above "void f(A a) {", putting junk on the else path does not
help.

> This is optimization is probably best done contained in step #3, rewriting.

I don't think that's a good idea -- we'll have to replicate LICM
inside RewriteStatepointsForGC, for instance, when we really want it
to run early. Without LICM of gc references we can't get rid of range
checks in simple loops like

   for (int i = 0; i < this.a.length; i++)
     this.a[i] = 0;

> Alternatively, you somehow make the dereferenceable attribute apply to individual fields. Hasn't some sort of “assume dereferenceable” intrinsic been proposed? You would still left with the problem that you want to speculate past one level of type check.

That's not sufficient for the "void f(A a) {" example -- the field at
offset 16 is dereferenceable for "a", but it can't be loaded as a
GCREF.

>> # Proposed Solution:
>>
>> We introduce a "new" LLVM type. I will still refer to it as GCREF
>> here, but it may actually still be "<ty> addrspace(k)*" where k is
>> specially noted in the datalayout.
>
> Sounds good to make this target-configurable.
>
>> Semantics:
>>
>> 1. GCREF represents an equivalence class of values (equivalence
>> relation being "points to a fixed semantic object"). The bitwise
>> representation fluctuates constantly outside the compiler's
>> control (the dual of `undef`), but may have invariants (in
>> particular, we'd like to be able to specify alignment, nonnull
>> etc.). At any given point in time all GCREF instances pointing to
>> the same object have the same bitwise representation (we need this
>> to make `icmp eq` is well-defined).
>
> OK.
>
>> 2. GCREF instances can only contain a valid gc reference (otherwise
>> they can't meaningfully "fluctuate" among the various possible
>> bitwise representations of a reference).
>
> Lost me.

The "runtime" comes in and modifies values of type GCREF to contain
different bitwise representations pointing to the same logical object.
It can't do this rewriting if a GCREF value contains some random
bitwise pattern, since that won't point to a valid object on the heap.

>
>> 3. Converting GCREF to integers is fine in general, but you'll get an
>> arbitrary "snapshot" of the bitwise value that will generally not
>> be meaningful (unless you are colluding with the GC in
>> implementation defined ways).
>

> I see what you’re getting at but don’t like the wording. Converting GCRef to integers is not “fine”. It’s not even meaningful. It just isn’t undefined behavior at the point of conversion.

This is really intended to support "object pinning", where the
frontend converts a GC reference to an integer and uses it as such.
The key phrase above is "colluding with the GC" -- you need whitebox
knowledge of how the GC works to be able to do convert GCREFs to
integers reliably.

>
>> 4. Converting integers to GCREF is allowed only if source integer is
>> a bitwise representation of a valid GC reference that is not an
>> out of bounds derived reference. However, this is difficult for
>> the compiler to infer since it typically will have no fundamental
>> knowledge of what bitwise representation can be a valid GC
>> reference.
>
> I don’t know about this. It seems like the integer’s lifetime could be extended across a safepoint.

I've explicitly not talked about safepoints at all here -- GCREF
values always have to hold "valid" references, and what a valid
reference is changes at arbitrary points in time (and at this layer
there are no "safepoints" denoting the only places where the
definition of a valid reference can change). This means, in general,
you can not for sure say that an integer contains a bitwise value that
is a valid GC reference, and even if you have one at a certain point
in time that it will stay valid in the next instant.

>> 5. Operations that use a GCREF-typed value are "atomic" in using the
>> bitwise representation, i.e., loading from a GCREF typed value
>> does not "internally" convert the GCREF to a normal
>> integer-pointer and then use the integer-pointer, since that would
>> mean there is a window in which the integer-pointer can become
>> stale[1].
>
> Yes, of course. It’s a good general rule for the optimizer. I wish any pass that violates atomic reads/writes of pointers had a good reason and some general way to disable it.
>
>> 6. A GCREF stored to a location in the heap continues to fluctuate,
>> and keeps itself in sync with the right bitwise representation.
>> In a way, there isn't a large distinction between the GC and the
>> heap -- the heap is part of (or managed by) the GC.
>>
>> I think (6) is the most controversial of the semantics above, but it
>> isn't very different from how `undef` stored to the heap remains
>> `undef` (i.e. a non-deterministic N-bit value) and a later load can
>> recover `undef` instead of getting a normal N-bit value.
>
> I think I know what you mean.
>
>> ## How do we fare with the examples?
>>
>> Let's take a look some of the examples, and see how the semantics
>> specified above help us:
>
> Is case #0, Integer -> GCREF Conversion, missing or covered below?

That's basically (4).

>> ### Demoting loads / stores from GCREF to integers is invalid (case 1):
>>
>> %v0 = load GCREF, GCREF* %ptr_0
>> %v1 = load GCREF, GCREF* %ptr_1
>> %cmp = icmp eq %v0, %v1
>> ==>
>> %v0 = load i64, i64* %ptr_0
>> %v1 = load i64, i64* %ptr_1
>> %cmp = icmp eq %v0, %v1
>>
>> Invalid because the initial IR is comparing two GC fluctuating
>> references, while the the transformed IR will compare their bitwise
>> snapshot taken at an arbitrary point in time.
>
> This seems unnecessary to reason about since any use of a GCRef’s bits is not meaningful at any point in the program later than its immediate use.

Yes, that's what I'm trying to say above. :slight_smile:

>> ### Demoting loads / stores from GCREF to integers is invalid (case 2):
>>
>> %v0 = load GCREF, GCREF* %loc0
>> store GCREF %v0, GCREF* %loc1
>> ==>
>> %v0 = load i64, i64* (bitcast %loc0)
>> store i64 %v0, i64* (bitcast %loc1)
>>
>> Invalid because there are two semantic differences between the initial
>> and final IR:
>>
>> 1. In the initial IR we store a fluctuating and up-to-date GCREF,
>> whereas in the final IR we store a potentially stale bitwise
>> snapshot of the value.
>> 2. In the initial IR, after the store *%loc1 has a fluctuating GC
>> reference, while in the final IR *%loc1 only has a once-valid
>> snapshot of a GC reference (that will no longer fluctuate).
>
> This is interesting to discuss, but still just falls under the GCRef cannot be converted to and used as an integer rule.

Ok.

>> ### Basic store forwarding is valid:
>>
>> store GCREF %v0, GCREF* %loc0
>> %v1 = load GCREF, GCREF* %loc0
>> ==>
>> store GCREF %v0, GCREF* %loc0
>> %v1 = %v0
>>
>> The store forwarding is valid since we stored a GCREF that
>> semantically points to object O (say), and loaded back a GCREF
>> pointing to the same object O.
>
> Good.
>
>> ### Hoisting inttoptr is (generally) invalid:
>>
>> if (<condition>) {
>> %val_gc = ptrtoint %val to GCREF
>> }
>> ==>
>> %val_gc = ptrtoint %val to GCREF
>> if (<condition>) {}
>>
>> is invalid since we do not know that `%val` is a valid bitwise
>> representation of a GCREF at the point in time we do the `ptrtoint`
>> (i.e.<<condition>> could be controlling whether `%val` is a valid
>> bitwise representation of a GCREF).
>>
>> In theory this isn't different from loads, divisions or any other
>> potentially trapping operation, but I think a lot of LLVM directly
>> encodes the fact that `CastInst` s are obviously speculatable, so
>> maybe Integer -> GCREF casts should be outlawed at the type system
>> level, and be done via an intrinsic?
>

> Is this supposed to be 'inttoptr’? I don’t see how this can work at all, since the integer’s lifetime can extend beyond a safepoint.

Do you mean the initial example? That is fine if <condition> is
always false at runtime.

> Do you really need to support constant GCRefs? Cliff Click said it was a bad decision.

Typically we don't need inttoptr to GCREFs, but other runtimes might.
I'm happy to not support it for now.

>> ### Store forwarding between integers and GCREFs is valid
>>
>> store i64 %val, i64* %loc
>> %val = load GCREF, GCREF* (bitcast %loc)
>> ==>
>> store i64 %val, i64* %loc
>> %val = inttoptr %val to GCREF
>>
>> is valid.
>>
>> Both the initial and the final program assume that %val is a valid
>> bitwise representation of a GC reference.
>>
>> (Note: it may not be valid to speculate the inttoptr instruction above
>> control flow).
>
> Why do you need this? As in the previous case, it doesn’t seem like a sound representation.

We don't need this -- the only realistic place this can happen in is
in dead code.

>> ### Load forwarding between integers and GCREFs is invalid
>>
>> %val_i = load i64, i64* %loc
>> %val_p = load GCREF, GCREF* (bitcast %loc)
>> ==>
>> %val_i = load i64, i64* %loc
>> %val_p = inttoptr %val_i to GCREF
>>
>> is invalid if *%loc contains a GC reference. Given the model that we
>> have, the first program loads some bitwise representation of
>> fluctuating a GC reference, and then loads the same GC reference as a
>> GCREF; whereas the second program tries to convert a potentially stale
>> bitwise representation of a GC reference into a fluctuating GCREF
>> (which is not allowed).
>
> I think the original code is unspecified, maybe undefined at the point of %val_i’s use.

I'd say that would depend on what use. For instance, say we have:

class A {
   long lField; // Offset 16
}

class B {
   Object oField; // Offset 16
}

void h(B b) {
   Object o = b;
   hC = b.oField.hashCode();
   if (o instanceof A) {
     print(((A)o).lField + 20);
   }
}

it is reasonable to optimize "h" to:

void h(i8* b) {
   GCREF o = *(b + 16)
   long l = *(b + 16)
   long l2 = l + 20;
   if (is_instance_of(b, A_class)) {
     print(l2)
   }
}

Above the use of l in "l + 20" has to be fine, and not UB.

>> ## Implementation Issues& Impact to LLVM
>>
>> This is a proposed path forward to implementing GCREF in LLVM. The
>> caveat mentioned at the start of the email (everything is open for
>> discussion) especially applies here.
>>
>> ### Changes to IR to represent GCREF
>>
>> We change `datalayout` to denote pointers of a specific address space
>> as GCREFs. We add verifier rules forbidding `inttoptr` conversion of
>> integers into GCREFs and add an intrinsic (that can be control
>> dependent) that converts integers to GCREFs.
>

> Sounds good to me. If you really need Int->GCRef conversion, then it seems like it needs to be an intrinsic with side effects, contrary to many of your examples.

Yes. The other solution is to make `inttoptr` a side effect depending
on the result type, but that sounds like a bad idea.

> Or are you thinking that this intrinsic is only generated after lowering GCRef stores in step #4?

> What about store barriers? Do they need an intrinsic or just `ptrtoint %gcref to i64`?

As long as stores are correctly typed (i.e. a store of GCREF type has
not been converted to a store of type i64, like instcombine does
today), I think it is sufficient to late-rewrite stores into using
a store barrier based on the `gc_ref_to_int` intrinsic that we'll add.

> This is probably covered somewhere else, but what are the semantics of your statepoints with regard to reading/writing memory?

This is what I have in mind: gc.statepoint only exists after we've
gotten rid of GCREFs -- they model the behavior of GCREF values in
terms of normal pointers that LLC knows how to codegen. The semantics
of gc.statepoint are specified (roughly) as:

(r0, r1, ... rn) = gc.statepoint(p0, p1, ... pn)

is equivalent to

   1. for each object o in the heap:
        o_new = malloc(size of o)
        *o_new = *o
        free(o)
        rewrite all refs to o to point to o_new

   2. for each object p_i in (p0, p1, ... pn):
        set r_i to the the location p_i was relocated to

>> ### Changes to the optimizer
>>
>> We change the optimizer to
>>
>> - Not speculate loads of GCREF type
>> - Not change uses of GCREF types to uses of non-GCREF types
>>
>> Locally we already have changes fixing most of the places that'd need
>> to be fixed for the above to hold.
>
> You may be answering this elsewhere in the thread, but are we sure the GC load shouldn’t be an intrinsic?

That's what others are leaning towards. I'm somewhat hesitant to do
that since it will involved changing almost every place that handles
LoadInst to also handle the gc_load intrinsic (and would be a bad
thing for the codebase long term), but I'm not 100% opposed to it if
that's what we need to move forward.

>> ### Lowering
>>
>> We do not teach LLC about GCREF's at all (in fact, we can change it to
>> assert that the datalayout does not contain a `gc` directive). We
>> make the RewriteStatepointForGC pass rewrite the IR to make
>> relocations explicit (so that no more GCREF-like semantics are
>> necessary) and remove the GC directive from the Module's datalayout.
>
> Ok. Is datalayout normally supposed to be used to convey IR lowering semantics? Or should it be immutable per target?
>
> During statepoint lowering, I guess the IR does not change in any locally recognizable way. Pointer types are all still addresspace(1) right? Changing the datalayout to change the meaning of those types makes sense to me if it’s ok with everyone else.

I see what you're getting at here -- changing the datalayout does seem
a little hacky. Maybe there is an efficient way to remap types
(without deleting and re-creating every instruction def'ing and using
GCREFs)?

>> [1]:
>> In reality, this is handled by _not_ putting safepoints in problematic
>> locations. For instance, if lowering a CompareAndSwap in some weird
>> architecture would involve doing:
>>
>> // CAS(From, To, Location)
>> Location ^=<< Constant>>
>>
>> // The GC cannot parse the state at this location "L", since we've
>> // done some arbitrary arithmetic on Location.
>>
>> ... Do some magic with Location, To, From
>>
>> then we _cannot_ put a safepoint at location "L”.
>
> I honestly don’t know what this footnote means at the LLVM IR level. I don’t think GCRef’s should exist as integer-types values prior to statepoint lowering, or you would have to convince me of why it’s necessary and safe.

It isn't necessary for us, and is safe only for runtimes that
support object pinning.

InstCombine today will transform

%v = load i32*, i32** %ptr0
store i32* %v, i32** %ptr1

into

%1 = bitcast i32** %ptr0 to i64*
%v1 = load i64, i64* %1, align 8
%2 = bitcast i32** %ptr1 to i64*
store i64 %v1, i64* %2, align 8

I don’t remember why this is important. I often don’t know with InstCombine whether something is needed to expose IR optimization or an ISEL pattern (which should really be somehow denoted and only run in a later pass).

GCREF → Integer conversion

Converting a GCREF to an integer is fine if the integer is unused.
However, cases like these are problematic:

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_1
%cmp = icmp eq %v0, %v1
== Transformed to ==>
%v0 = load i64, i64* %ptr_0
%v1 = load i64, i64* %ptr_1
%cmp = icmp eq %v0, %v1

Since in the second version if “L” is, say, between %v0 and %v1, “G”
would not include “v0” which contains a GC reference that the GC may
want to scan or relocate[0].

For similar reasons

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_0
%cmp = icmp eq %v0, %v1
== Transformed to ==>
%v0 = load GCREF, GCREF* %ptr_0
%v0.i = ptrtoint %v0
%v1 = load GCREF, GCREF* %ptr_0
%v1.i = ptrtoint %v1
%cmp = icmp eq %v0.i, %v1.i

is also invalid.

Converting any GCREF to an integer is BAD, period. A single benign use could float below a safepoint. Imagine if you expanded card marks early before step #3!

Here I’m trying to constrain the optimizer – if the frontend wants to
insert ptrtoints and knows what it is doing (because it has GC
specific knowledge), then that should be fine.

Sure, if the frontend has some way to guarantee that statepoint insertion can’t happen between ptrtoint and its uses. But IR passes should never generate a ptrtoint from a GCRref, regardless of how it is used. You want very simple rules for pass writers that are also verifiable. If the frontend wants to use ptrtoint on a GCRef it could use an instrinsic for that purpose that is exempt from the GCRef verifier.

Bad types due to hoisting loads out of control flow

This is bad

if (is_reference) {
%val = load GCREF, GCREF* %ptr
}
== Transformed to ==>
%val = load GCREF, GCREF* %ptr
if (is_reference) {
}

unless the compiler can prove that %val is a GC reference at its new
location. Downstream we model the Java type system in LLVM IR to try
to prove that %val is a GC reference in its new location, but for
starters we will have to be conservative upstream.

This is really abstract. Why would the load be hoisted if %ptr is not dereferenceable? Why would it be dereferenceable if GCRef is not valid?

I think this has to do with type checks. You could have a valid object, but the GCRef field may only be valid under some type check.

Yes, in practice the “is_reference” check will be a type check. You
can end up with this from Java code that looks like:

class A {
long longField; // Offset 16, say
}

class B {
Object objField; // Offset 16 also
}

void f(A a) {
Object o = a;

// Since a is of type A, we know that it is dereferenceable up to
// 16+8 == 24 bytes, but it does not mean that we can speculate
// the load of objField.

// This is kind of silly as written, but this does happen after
// inlining.

if (o instanceof B) {
// This is dead code, and we will often DCE it away, but we’re
// not allowed to rely on that happening (before hoisting) for
// correctness.
Object obj = ((B) o).objField;
print(obj.hashCode());
}
}

It is definitely important to speculate these loads, but I agree that it needs to be done with special logic that understands GCRef.

In particular, you can speculate one such load, but never a dependent load.

We can’t typically speculate dependent loads, but that’s because we
can’t (typically) prove dependent loads as loading from non-null
locations. E.g. in

class A {
Object o;
}

class B {
A a;
void g() {
if (cond) {
this.a.o.hashCode();
}
}
}

We could speculate the load of .o if this.a could be proved to be
non-null.

You could solve the dominance problem by introducing some garbage value on the ‘else’ path. But then your simple invariants don’t hold.

Not sure what you mean by “simple invariants don’t hold”, but in the
example above “void f(A a) {”, putting junk on the else path does not
help.

For that to work, you would have to know where potential statepoints might be, which is why it breaks the invariant that GCRefs are always valid, and why I proposed only doing this within the safepoint insertion pass.

This is optimization is probably best done contained in step #3, rewriting.

I don’t think that’s a good idea – we’ll have to replicate LICM
inside RewriteStatepointsForGC, for instance, when we really want it
to run early. Without LICM of gc references we can’t get rid of range
checks in simple loops like

for (int i = 0; i < this.a.length; i++)
this.a[i] = 0;

LICM isn’t speculation, unless its looking at the dereferenceable attribute and the loop has conditions.

I thought your proposed solution was to never speculate loads of GCRefs, which is strictly worse than what I’m suggesting.

Alternatively, you somehow make the dereferenceable attribute apply to individual fields. Hasn’t some sort of “assume dereferenceable” intrinsic been proposed? You would still left with the problem that you want to speculate past one level of type check.

That’s not sufficient for the “void f(A a) {” example – the field at
offset 16 is dereferenceable for “a”, but it can’t be loaded as a
GCREF.

I think there is an important design question here, because I’ve also heard grumbling about LLVM’s lack of condition dependence tracking outside of the GCRef context.

+CC Daniel Berlin

If you can come up with an IR design for tracking your GCRef load depedencies (usually on specific null and type checks), then that could solve some of the other difficult problems that others are wrestling with (sorry I don’t have a pointer to the discussions). This probably means inserting some cast-to-dereferenceable instruction on the address generation def-use chain.

Otherwise, you have to either impose some subtle rules on IR optimization passes, or make all GCRef loads intrinsics, which is also a maintenance burden.

Proposed Solution:

We introduce a “new” LLVM type. I will still refer to it as GCREF
here, but it may actually still be “ addrspace(k)*” where k is
specially noted in the datalayout.

Sounds good to make this target-configurable.

Semantics:

  1. GCREF represents an equivalence class of values (equivalence
    relation being “points to a fixed semantic object”). The bitwise
    representation fluctuates constantly outside the compiler’s
    control (the dual of undef), but may have invariants (in
    particular, we’d like to be able to specify alignment, nonnull
    etc.). At any given point in time all GCREF instances pointing to
    the same object have the same bitwise representation (we need this
    to make icmp eq is well-defined).

OK.

  1. GCREF instances can only contain a valid gc reference (otherwise
    they can’t meaningfully “fluctuate” among the various possible
    bitwise representations of a reference).

Lost me.

The “runtime” comes in and modifies values of type GCREF to contain
different bitwise representations pointing to the same logical object.
It can’t do this rewriting if a GCREF value contains some random
bitwise pattern, since that won’t point to a valid object on the heap.

Ok, but you also said the derived GCRef can be any bit pattern due to out-of-bounds addressing. I think there is a constraint that the offset from the GC root can be determined via a straightforward analysis.

  1. Converting integers to GCREF is allowed only if source integer is
    a bitwise representation of a valid GC reference that is not an
    out of bounds derived reference. However, this is difficult for
    the compiler to infer since it typically will have no fundamental
    knowledge of what bitwise representation can be a valid GC
    reference.

I don’t know about this. It seems like the integer’s lifetime could be extended across a safepoint.

I’ve explicitly not talked about safepoints at all here – GCREF
values always have to hold “valid” references, and what a valid
reference is changes at arbitrary points in time (and at this layer
there are no “safepoints” denoting the only places where the
definition of a valid reference can change). This means, in general,
you can not for sure say that an integer contains a bitwise value that
is a valid GC reference, and even if you have one at a certain point
in time that it will stay valid in the next instant.

Well, it would be much much easier to understand and verify if inttoptr were simply illegal. It’s confusing because the integer is defined somewhere else, yet it needs to be a valid GCRef at the point of use and you don’t want to allow GCRefs to live in integer values.

As with ptrtoint, this seems like it would be better as a separate intrinsic if some hypothetical frontend needs it.

If this is allowed, doesn’t the integer need to be a GCRoot so a simple analysis can determine the offset?

Hoisting inttoptr is (generally) invalid:

if () {
%val_gc = ptrtoint %val to GCREF
}
==>
%val_gc = ptrtoint %val to GCREF
if () {}

is invalid since we do not know that %val is a valid bitwise
representation of a GCREF at the point in time we do the ptrtoint
(i.e.<> could be controlling whether %val is a valid
bitwise representation of a GCREF).

In theory this isn’t different from loads, divisions or any other
potentially trapping operation, but I think a lot of LLVM directly
encodes the fact that CastInst s are obviously speculatable, so
maybe Integer → GCREF casts should be outlawed at the type system
level, and be done via an intrinsic?

Is this supposed to be 'inttoptr’? I don’t see how this can work at all, since the integer’s lifetime can extend beyond a safepoint.

Do you mean the initial example? That is fine if is
always false at runtime.

The title of the example says “inttoptr”…

Do you really need to support constant GCRefs? Cliff Click said it was a bad decision.

Typically we don’t need inttoptr to GCREFs, but other runtimes might.
I’m happy to not support it for now.

Again, this may be better served with a gcref_to_int intrinsic that cannot be hoisted.

Load forwarding between integers and GCREFs is invalid

%val_i = load i64, i64* %loc
%val_p = load GCREF, GCREF* (bitcast %loc)
==>
%val_i = load i64, i64* %loc
%val_p = inttoptr %val_i to GCREF

is invalid if *%loc contains a GC reference. Given the model that we
have, the first program loads some bitwise representation of
fluctuating a GC reference, and then loads the same GC reference as a
GCREF; whereas the second program tries to convert a potentially stale
bitwise representation of a GC reference into a fluctuating GCREF
(which is not allowed).

I think the original code is unspecified, maybe undefined at the point of %val_i’s use.

I’d say that would depend on what use. For instance, say we have:

class A {
long lField; // Offset 16
}

class B {
Object oField; // Offset 16
}

void h(B b) {
Object o = b;
hC = b.oField.hashCode();
if (o instanceof A) {
print(((A)o).lField + 20);
}
}

it is reasonable to optimize “h” to:

void h(i8* b) {
GCREF o = *(b + 16)
long l = *(b + 16)
long l2 = l + 20;
if (is_instance_of(b, A_class)) {
print(l2)
}
}

Above the use of l in “l + 20” has to be fine, and not UB.

Good example.

Of course, it could be just another instance of a “no inttoptr/ptrtoint” rule.

Implementation Issues& Impact to LLVM

This is a proposed path forward to implementing GCREF in LLVM. The
caveat mentioned at the start of the email (everything is open for
discussion) especially applies here.

Changes to IR to represent GCREF

We change datalayout to denote pointers of a specific address space
as GCREFs. We add verifier rules forbidding inttoptr conversion of
integers into GCREFs and add an intrinsic (that can be control
dependent) that converts integers to GCREFs.

This intrinsic sounds like what I’m suggesting, except that I’m suggesting that the frontend can hypothetically generate the intrinsic if needed, and rather than prohibiting inttoptr/ptrtoint conversion, you outright inhibit it’s existence, which is trivial to verify.

Sounds good to me. If you really need Int->GCRef conversion, then it seems like it needs to be an intrinsic with side effects, contrary to many of your examples.

Yes. The other solution is to make inttoptr a side effect depending
on the result type, but that sounds like a bad idea.

I think you want side effects on the Int->GCRef intrinsic (hoisting it is illegal, sinking it below calls is illegal), and don’t want to use inttoptr at all.

Or are you thinking that this intrinsic is only generated after lowering GCRef stores in step #4?

What about store barriers? Do they need an intrinsic or just ptrtoint %gcref to i64?

As long as stores are correctly typed (i.e. a store of GCREF type has
not been converted to a store of type i64, like instcombine does
today), I think it is sufficient to late-rewrite stores into using
a store barrier based on the gc_ref_to_int intrinsic that we’ll add.

Changes to the optimizer

We change the optimizer to

  • Not speculate loads of GCREF type
  • Not change uses of GCREF types to uses of non-GCREF types

Locally we already have changes fixing most of the places that’d need
to be fixed for the above to hold.

You may be answering this elsewhere in the thread, but are we sure the GC load shouldn’t be an intrinsic?

That’s what others are leaning towards. I’m somewhat hesitant to do
that since it will involved changing almost every place that handles
LoadInst to also handle the gc_load intrinsic (and would be a bad
thing for the codebase long term), but I’m not 100% opposed to it if
that’s what we need to move forward.

I think this would be just another workaround for LLVM’s lack of dependence tracking on loads, which is not to say it’s wrong thing to do, but just becomes a matter of evaluating the engineering/maintenance burden.

Lowering

We do not teach LLC about GCREF’s at all (in fact, we can change it to
assert that the datalayout does not contain a gc directive). We
make the RewriteStatepointForGC pass rewrite the IR to make
relocations explicit (so that no more GCREF-like semantics are
necessary) and remove the GC directive from the Module’s datalayout.

Ok. Is datalayout normally supposed to be used to convey IR lowering semantics? Or should it be immutable per target?

During statepoint lowering, I guess the IR does not change in any locally recognizable way. Pointer types are all still addresspace(1) right? Changing the datalayout to change the meaning of those types makes sense to me if it’s ok with everyone else.

I see what you’re getting at here – changing the datalayout does seem
a little hacky. Maybe there is an efficient way to remap types
(without deleting and re-creating every instruction def’ing and using
GCREFs)?

I really don’t know that the “right thing to do” is. I supposed if no one objects to changing datalayout then it’s the simplest option.

-Andy

Hi Andy,

Andrew Trick wrote:
>
> I don’t remember why this is important. I often don't know with
> InstCombine whether something is needed to expose IR optimization or an
> ISEL pattern (which should really be somehow denoted and only run in a
> later pass).

But for the purposes of this discussion, only the legality (or lack
thereof) of the above transform matters, not whether it is profitable
or not.

>> > Converting any GCREF to an integer is BAD, period. A single benign
>> use could float below a safepoint. Imagine if you expanded card marks
>> early before step #3!
>>
>> Here I'm trying to constrain the optimizer -- if the frontend wants to
>> insert ptrtoints and knows what it is doing (because it has GC
>> specific knowledge), then that should be fine.
>
> Sure, if the frontend has some way to guarantee that statepoint
> insertion can’t happen between ptrtoint and its uses. But IR passes
> should never generate a ptrtoint from a GCRref, regardless of how it is
> used.
> You want very simple rules for pass writers that are also
> verifiable.
> If the frontend wants to use ptrtoint on a GCRef it could
> use an instrinsic for that purpose that is exempt from the GCRef verifier.

Agree on all three counts.

>> We could speculate the load of .o if this.a could be proved to be
>> non-null.
>>
>> > You could solve the dominance problem by introducing some garbage
>> value on the ‘else’ path. But then your simple invariants don’t hold.
>>
>> Not sure what you mean by "simple invariants don’t hold", but in the
>> example above "void f(A a) {", putting junk on the else path does not
>> help.
>
> For that to work, you would have to know where potential statepoints
> might be, which is why it breaks the invariant that GCRefs are always
> valid, and why I proposed only doing this within the safepoint insertion
> pass.
>
>> > This is optimization is probably best done contained in step #3,
>> rewriting.
>>
>> I don't think that's a good idea -- we'll have to replicate LICM
>> inside RewriteStatepointsForGC, for instance, when we really want it
>> to run early. Without LICM of gc references we can't get rid of range
>> checks in simple loops like
>>
>> for (int i = 0; i < this.a.length; i++)
>> this.a[i] = 0;
>
> LICM isn’t speculation, unless its looking at the dereferenceable
> attribute and the loop has conditions.

Yes, that was a poor example, a better example would be one with
conditions, as you suggest:

   for (int i = 0; i < this.a.length; i++)
     if (cond)
       this.a[i] = 0;

> I thought your proposed solution was to *never* speculate loads of
> GCRefs, which is strictly worse than what I’m suggesting.

My proposal to never speculate loads of GCRefs is only a starting
point. Downstream we have analyses that use Java-specific
knowledge to infer type information (on a best-effort basis, but this
works fairly well in practice), and can use that to prove e.g. that a
certain value is of type java.lang.Foo in the loop preheader, so a
GCREF-load from offset 128 (say) can be legally issued in the loop
preheader.

>> > Alternatively, you somehow make the dereferenceable attribute apply
>> to individual fields. Hasn't some sort of “assume dereferenceable”
>> intrinsic been proposed? You would still left with the problem that
>> you want to speculate past one level of type check.
>>
>> That's not sufficient for the "void f(A a) {" example -- the field at
>> offset 16 is dereferenceable for "a", but it can't be loaded as a
>> GCREF.
>
> I think there is an important design question here, because I’ve also
> heard grumbling about LLVM’s lack of condition dependence tracking
> outside of the GCRef context.
>
> +CC Daniel Berlin
>
> If you can come up with an IR design for tracking your GCRef load
> depedencies (usually on specific null and type checks), then that could

That sounds orthogonal to this proposal -- as a starting point we need
a simple way to mark a GCRef load as control dependent on every
condition executed since the start of the program. Something like
cast-to-dereferenceable can possibly be added once we start
upstreaming analyses that prove that a GCRef load is safe to
speculate, but that's for better performance not correctness.

Having said that, I'll keep the "explicit control dependence" idea in
mind -- maybe there is way to find "a simple way to mark a GCRef load
as control dependent on every condition executed since the start of
the program" that can be generalized to remember more precise control
dependence later.

> solve some of the other difficult problems that others are wrestling
> with (sorry I don’t have a pointer to the discussions). This probably
> means inserting some cast-to-dereferenceable instruction on the address
> generation def-use chain.
>
> Otherwise, you have to either impose some subtle rules on IR
> optimization passes, or make all GCRef loads intrinsics, which is also a
> maintenance burden.

>> The "runtime" comes in and modifies values of type GCREF to contain
>> different bitwise representations pointing to the same logical object.
>> It can't do this rewriting if a GCREF value contains some random
>> bitwise pattern, since that won't point to a valid object on the heap.
>
> Ok, but you also said the derived GCRef can be any bit pattern due to
> out-of-bounds addressing. I think there is a constraint that the offset
> from the GC root can be determined via a straightforward analysis.

You're right -- that's missing in the specification (derived pointer
provenance), I'll add that in in the next iteration.

>> I've explicitly not talked about safepoints at all here -- GCREF
>> values always have to hold "valid" references, and what a valid
>> reference is changes at arbitrary points in time (and at this layer
>> there are no "safepoints" denoting the only places where the
>> definition of a valid reference can change). This means, in general,
>> you can not for sure say that an integer contains a bitwise value that
>> is a valid GC reference, and even if you have one at a certain point
>> in time that it will stay valid in the next instant.
>
> Well, it would be much much easier to understand and verify if inttoptr
> were simply illegal. It’s confusing because the integer is defined
> somewhere else, yet it needs to be a valid GCRef at the point of use and
> you don’t want to allow GCRefs to live in integer values.
>
> As with ptrtoint, this seems like it would be better as a separate
> intrinsic if some hypothetical frontend needs it.

I think this is the most straightforward path forward ^.

> If this is allowed, doesn’t the integer need to be a GCRoot so a simple
> analysis can determine the offset?

>> > Is this supposed to be 'inttoptr’? I don’t see how this can work at
>> all, since the integer’s lifetime can extend beyond a safepoint.
>>
>> Do you mean the initial example? That is fine if <condition> is
>> always false at runtime.
>
> The title of the example says “inttoptr”…

Yes, that's just a typo in the code example; sorry.

>> > Do you really need to support constant GCRefs? Cliff Click said it
>> was a bad decision.
>>
>> Typically we don't need inttoptr to GCREFs, but other runtimes might.
>> I'm happy to not support it for now.
>
> Again, this may be better served with a gcref_to_int intrinsic that
> cannot be hoisted.

Yes.

>>
>> Above the use of l in "l + 20" has to be fine, and not UB.
>
> Good example.
>
> Of course, it could be just another instance of a “no inttoptr/ptrtoint”
> rule.

That should work.

>
>> >> ## Implementation Issues& Impact to LLVM
>> >>
>> >> This is a proposed path forward to implementing GCREF in LLVM. The
>> >> caveat mentioned at the start of the email (everything is open for
>> >> discussion) especially applies here.
>> >>
>> >> ### Changes to IR to represent GCREF
>> >>
>> >> We change `datalayout` to denote pointers of a specific address space
>> >> as GCREFs. We add verifier rules forbidding `inttoptr` conversion of
>> >> integers into GCREFs and add an intrinsic (that can be control
>> >> dependent) that converts integers to GCREFs.
>
> This intrinsic sounds like what I’m suggesting, except that I’m
> suggesting that the frontend can hypothetically generate the intrinsic
> if needed, and rather than prohibiting `inttoptr/ptrtoint` conversion,
> you outright inhibit it’s existence, which is trivial to verify.

Sounds good.

>> > Sounds good to me. If you really need Int->GCRef conversion, then it
>> seems like it needs to be an intrinsic with side effects, contrary to
>> many of your examples.
>>
>> Yes. The other solution is to make `inttoptr` a side effect depending
>> on the result type, but that sounds like a bad idea.
>
> I think you want side effects on the Int->GCRef intrinsic (hoisting it
> is illegal, sinking it below calls is illegal), and don’t want to use
> inttoptr at all.

Yes.

[snip]

-- Sanjoy

Hi Andy,

Andrew Trick wrote:
>
> I don’t remember why this is important. I often don't know with
> InstCombine whether something is needed to expose IR optimization or an
> ISEL pattern (which should really be somehow denoted and only run in a
> later pass).

But for the purposes of this discussion, only the legality (or lack
thereof) of the above transform matters, not whether it is profitable
or not.

Given that you will need to disable the transform for GCRefs, it’s interesting that if it’s only something that needs to run before ISEL then you’re not actually losing any optimization.

> If you can come up with an IR design for tracking your GCRef load
> depedencies (usually on specific null and type checks), then that could

That sounds orthogonal to this proposal -- as a starting point we need
a simple way to mark a GCRef load as control dependent on every
condition executed since the start of the program. Something like
cast-to-dereferenceable can possibly be added once we start
upstreaming analyses that prove that a GCRef load is safe to
speculate, but that's for better performance not correctness.

I thought your frontend would know which conditions guard each field access.

Having said that, I'll keep the "explicit control dependence" idea in
mind -- maybe there is way to find "a simple way to mark a GCRef load
as control dependent on every condition executed since the start of
the program" that can be generalized to remember more precise control
dependence later.

I only suggested it at this point because you mentioned how much work implementing optimizations on a gc_load intrinsic would be, and that time could otherwise be directed to a general LLVM improvement.

If it’s reasonable to prohibit speculation on normal loads where the loaded value happens to be a GCRef, then of course that’s easiest.

-Andy

Hi Andy,

Andrew Trick wrote:
>> But for the purposes of this discussion, only the legality (or lack
>> thereof) of the above transform matters, not whether it is profitable
>> or not.
>
> Given that you will need to disable the transform for GCRefs, it’s interesting that if it’s only something that needs to run before ISEL then you’re not actually losing any optimization.

Ah, okay; that makes sense.

>>> If you can come up with an IR design for tracking your GCRef load
>>> depedencies (usually on specific null and type checks), then that could
>> That sounds orthogonal to this proposal -- as a starting point we need
>> a simple way to mark a GCRef load as control dependent on every
>> condition executed since the start of the program. Something like
>> cast-to-dereferenceable can possibly be added once we start
>> upstreaming analyses that prove that a GCRef load is safe to
>> speculate, but that's for better performance not correctness.
>
> I thought your frontend would know which conditions guard each field access.

At a theoretical level it does, but in practice that kind of control
dependence can be difficult to split apart from "normal" control
dependence.

For instance, if we have:

   interface I {
     void func();
   }

   class F {
     Object o;
     void func() {
       Object val = this.o;
     }
   }

   class G {
     public static void f(I instance) {
       instance.func();
     }
   }

then when compiling func() separately there is nothing the load of val
is control dependent on, but in G::f if we do some form of predicated
devirtualization and then inline the body of F::func then the safety
of the load is predicated on the devirtualization predicate passing
too. If the predicated devirt. was itself conditional on some control
flow based type refinement, then the safety of the load is control
dependent on that control flow as well, and so on.

[snip]

-- Sanjoy

Yeah, the devirtualizer needs to insert a type cast, then inlining the load just inherits it. Isn’t that all done before LLVM IR?
-Andy

Hi all,

It looks like the key controversial point is the bit about the extra
control dependence on loads and stores[0]. Generally the consensus is
that (please chime if you think otherwise) it is not reasonable to
make the safety (or semantics) of a load instruction depend on the
type it is loading. Introducing such a thing would involve changing
the IR semantics in a fundamental way, and therefore has a high bar
for acceptance.

Here is a summary of the alternative solutions that were proposed here
and on IRC (thanks Chandler, Andy, Eli!):

  1. Model loads and stores of GC references as intrinsic calls: add
     llvm.gc_load, llvm.gc_store intrinsics, and optimize them as loads
     and stores whenever appropriate and legal.

  2. Introduce a flag on load and stores that either
       a. Denotes a "gc_safety" control dependence.
       b. Denotes a "blackbox_safety" control dependence. In this case
          we will probably have some optional metadata on loads and
          stores to indicate that the control dependence is actually on
          GC safety.

     As a starting point, LLVM will conservatively not speculate such
     loads and stores; and will leave open the potential to upstream
     logic that will have a more precise sense of when these loads and
     stores are safe to speculate.

  3. Introduce a general way to denote control dependence on loads and
     stores. This can be helpful to LLVM in general, and will let us
     basically implement a more precise version of (2).

# Tradeoffs

(1) is the easiest to implement initially, but I think it will be bad
for LLVM in the long term -- every place that looks at loads and
stores will have to now look at this additional set of intrinsics.
This won't be terribly complex, but it will be noisy.

I personally like (2) the most. It fits in well with the current
framework: since most (all?) load/store speculation has to do some
sort of safety check anyway we can fold the "gc_safety" or
"blackbox_safety" logic into those safety checks. In practice maybe
we can even factor this into one of the `isSimple` or
`isUnordered` -like checks.

(3) is probably the cleanest solution, but I'm worried about its scope
-- given that this will be large investment, I'm worried I'll spin my
wheels on this for a long time and ultimately realize that it isn't
really the right fix.

Thoughts?
-- Sanjoy

[0]: I may have (incorrectly) mentioned otherwise on IRC, but we need
to model safety properties of stores as well, to avoid transforms
like:

   %x = malloc() ;; known thread local
   if (cond_0) {
     store GCREF %val to %x
   }
   if (cond_1) {
     store i64 %val to %x
   }

to

   %x = malloc() ;; known thread local
   if (cond_0 || cond_1) {
     store GCREF %val to %x ;; This "speculative" store is bad
     if (cond_1) {
       store i64 %val to %x
     }
   }

Hi Andy,

Andrew Trick wrote:
>
>>
>> interface I {
>> void func();
>> }
>>
>> class F {
>> Object o;
>> void func() {
>> Object val = this.o;
>> }
>>
>> class G {
>> public static void f(I instance) {
>> instance.func();
>> }
>>
>> then when compiling func() separately there is nothing the load of val
>> is control dependent on, but in G::f if we do some form of predicated
>> devirtualization and then inline the body of F::func then the safety
>> of the load is predicated on the devirtualization predicate passing
>> too. If the predicated devirt. was itself conditional on some control
>> flow based type refinement, then the safety of the load is control
>> dependent on that control flow as well, and so on.
>
> Yeah, the devirtualizer needs to insert a type cast, then inlining the
> load just inherits it. Isn’t that all done before LLVM IR?

We do some devirtualization "incrementally" (i.e. while inlining /
optimizing the IR), but I think I was over-complicating -- this looks
like it is doable. Intuitively, since Java's type system is
verifiable, we should be able to model it directly in the IR without
too many complications.

However, I'm still worried about the scope. Going by your suggestion,
IIUC this is how the above devirtualization transform will look like:

void G::f(I instance) {
   invoke_interface(I::func, instance);
}

== predicated devirt ==>

void G::f(I instance) {
   if (get_klass(instance) == F.klass) {
     instance_casted = cast_to_type(instance, !type)
     F::func(instance_casted);
   } else {
     blah();
   }
}

!type = !descriptor_for_F

== inline F::func etc ==>

...

and to get back to our "baseline" performance (i.e. hoist loads of
non-GCREF type as usual, but don't touch GCREF loads), we'd have to
teach LLVM that it is okay to speculate non-GCREF loads through
cast_to_type, but not GCREF loads. We will also have to teach LLVM
about aliasing properties of these intrinsics.

I think cast_to_type is a good second step we can take when we start
modeling Java's type system in LLVM IR, but it looks too complex to do
as an initial step.

-- Sanjoy

Hi all,

It looks like the key controversial point is the bit about the extra
control dependence on loads and stores[0]. Generally the consensus is
that (please chime if you think otherwise) it is not reasonable to
make the safety (or semantics) of a load instruction depend on the
type it is loading. Introducing such a thing would involve changing
the IR semantics in a fundamental way, and therefore has a high bar
for acceptance.

Here is a summary of the alternative solutions that were proposed here
and on IRC (thanks Chandler, Andy, Eli!):

Thanks for writing up the summary, sorry I didn’t summarize #2 for you as I had promised. ;]

  1. Introduce a flag on load and stores that either
    a. Denotes a “gc_safety” control dependence.
    b. Denotes a “blackbox_safety” control dependence. In this case
    we will probably have some optional metadata on loads and
    stores to indicate that the control dependence is actually on
    GC safety.

Suggested name for 2b: “unsafe” or “predicated”. Essentially, that the load is not generally safe to do even if the memory is generally safe to load from because of some external effect.

As a starting point, LLVM will conservatively not speculate such
loads and stores; and will leave open the potential to upstream
logic that will have a more precise sense of when these loads and
stores are safe to speculate.

  1. Introduce a general way to denote control dependence on loads and
    stores. This can be helpful to LLVM in general, and will let us
    basically implement a more precise version of (2).

I actually think there is a pretty clean incremental path from 2a to 3 here. We can start with just the basic “there is something control dependent about this” and use metadata to experiment with communicating it to the optimizer. If we come up with something really general and precise, we can fold it into the blanket thing, but until then we don’t scope creep or put something into the IR.

One thing you didn’t mention is that this requires a matching function attribute too so that we can distinguish between a call site that reads memory and a call site that reads memory in a way that is potentially unsafe.