RFC: Strong GC References in LLVM

LLVM's design decision is one where everything has to explicitly care
about implicit early exits to get correct answers (and not to harp too
much, but "not everything does", years later). If they don't, they will
get wrong answers.

So, ironically, while looking at this, i noticed it turns out LLVM's PRE in
GVN is another place that does not do this correctly either.

It will insert and hoist loads past may-throw calls depending on whether it
thinks the call aliases the pointer or not (IE depending on what memdep
tells it, and memdep only cares about aliasing here when coming up with
deps), regardless of whether the call is nounwind or not. This is rare but
can happen.

This is because memdep does this:
       // If the call has no effect on the queried pointer, just ignore it.
So it does not give a dep, and PRE then never does anything else to check
whether there is a may-throw call in the way of the hoist.

Testcase and patch coming.

Even more ironically, in gcc land, the non-local case would have been
prevented by this code GVN tries to use:

     // If any of these blocks has more than one successor (i.e. if the
edge we
     // just traversed was critical), then there are other paths through
this
     // block along which the load may not be anticipated. Hoisting the
load
     // above this block would be adding the load to execution paths along
     // which it was not previously executed.
     if (TmpBB->getTerminator()->getNumSuccessors() != 1)
       return false;

Since it would have had edges to the exit block in any predecessor with a
may-throw call, it would have gotten the right answer.

Anyway, since i still don't plan on proposing changes here, i'm going to
stop harping on this for a while.

Let me rephrase: It didn’t seem to me like the fundamental problem we were up against in this discussion, and it’s definitely very difficult to change given the burden it would place on intrinsics.

FWIW, for a long time I was a very strong proponent of explicit control flow because I like making it easy to reason about CFG transforms and code motion. But I gradually came around to realize it’s a legitimate design either way. In some ways it works well not to have a CFG edge for exits where it’s illegal to insert code. I think LLVM passes have also been biased toward algorithms that scale in the number of blocks.

That’s all I really have to say about it.

-Andy

At some point I stopped thinking about this as a bug and realized that you just need to think of LLVM as modeling speculative code barriers as memory dependence. In LLVM, it makes no sense to have a readonly may-throw call.

Andy

Hi Andy,

Andrew Trick wrote:
> At some point I stopped thinking about this as a bug and realized that
> you just need to think of LLVM as modeling speculative code barriers as
> memory dependence. In LLVM, it makes no sense to have a readonly
> may-throw call.

The problem is that that model breaks down with aggressive aliasing
like:

void foo(int* restrict ptr) {
   *ptr = 40;
   may_throw(); // read/write call
   *ptr = 50;
}

Now it is tempting to CSE the store of 40 to *ptr. If we can't do
that then what does restrict/noalias even mean?

-- Sanjoy

I thought it meant ‘ptr’ doesn’t alias with other ‘restrict’ pointer args. Not that it’s an exclusive way to access the memory. I could be wrong though...

In the same way you can’t have readonly-maythrow, you wouldn’t have TBAA+maythrow. Yeah, it’s not great.

-Andy

The call does not have to be read only, it just has to no alias the load being pre’d.
The call may in fact be read/ write of some other memory

IE:

int *A;
int *D;

int bar()
{
*D = 5;
if (im_bored)
throw();
return *D;

}

void foo()
{
int z = 0;

if (?)

{
z = *A + 5;
}
bar()
y = *A + 5;

}

or whatever.
bar will be noalias of *A, but still validly throws, and is even read-write.

Why? A decision was made to give pointers types, and we've decided to
change that. It is not clear to me that the decision to allow implicit
early exits was, in retrospect, optimal. I think it is completely healthy
for the project to reevaluate these kinds of decisions. We now have many
years of experience, bug reports, and we should have a good ability to
evaluate the compile-time impact of a potential change.

Let me rephrase: It didn’t seem to me like the fundamental problem we were
up against in this discussion, and it’s definitely very difficult to change
given the burden it would place on intrinsics.

Yes, this discussion has gone a bit off track, which i will apologize for.

FWIW, for a long time I was a very strong proponent of explicit control
flow because I like making it easy to reason about CFG transforms and code
motion. But I gradually came around to realize it’s a legitimate design
either way. In some ways it works well not to have a CFG edge for exits
where it’s illegal to insert code.

I think LLVM passes have also been biased toward algorithms that scale in
the number of blocks.

If this scaling is a design goal that's really interesting. Most of the
memory opt and code motion algorithms i've worked on in LLVM are
block/instruction-capped in the same way (and most have lower caps than GCC
due to compile time impact on llvm).
There certainly was a time when that wasn't true (i remember back in the
day :P), but it's no longer true.

Of course, I don't actually think either design necessarily makes scaling
easier or harder, that's just "the state of the world".

Not to argue this, since I completely agree philosophically. But my mental model for LLVM is that the “throw()” itself writes unknown memory…

Incidentally, it seems arguable to me to turn “maythrow” calls into invokes. The thing I don’t like about invokes is the shared landing pads.

What I was saying that started this tangent is that the notion of early exits would be hard to get rid of entirely. I at least wanted a ‘noexit/terminate’ attribute, but I guess no one needed it badly enough yet to add it.

-Andy

From: "Andrew Trick" <atrick@apple.com>
To: "Sanjoy Das" <sanjoy@playingwithpointers.com>
Cc: "Daniel Berlin" <dberlin@dberlin.org>, "llvm-dev" <llvm-dev@lists.llvm.org>, "Joseph Tremoulet"
<jotrem@microsoft.com>, "Oscar Blumberg" <oscar.blumberg@normalesup.org>, "Chandler Carruth" <chandlerc@gmail.com>,
"Nick Lewycky" <nlewycky@google.com>, "Hal Finkel" <hfinkel@anl.gov>, "Philip Reames" <listmail@philipreames.com>,
"Manuel Jacob" <me@manueljacob.de>, "Eli Friedman" <eli.friedman@gmail.com>, "David Majnemer"
<david.majnemer@gmail.com>
Sent: Friday, July 15, 2016 7:40:51 PM
Subject: Re: RFC: Strong GC References in LLVM

>
> Hi Andy,
>
> Andrew Trick wrote:
> > At some point I stopped thinking about this as a bug and realized
> > that
> > you just need to think of LLVM as modeling speculative code
> > barriers as
> > memory dependence. In LLVM, it makes no sense to have a readonly
> > may-throw call.
>
> The problem is that that model breaks down with aggressive aliasing
> like:
>
> void foo(int* restrict ptr) {
> *ptr = 40;
> may_throw(); // read/write call
> *ptr = 50;
> }
>
> Now it is tempting to CSE the store of 40 to *ptr. If we can't do
> that then what does restrict/noalias even mean?

I thought it meant ‘ptr’ doesn’t alias with other ‘restrict’ pointer
args. Not that it’s an exclusive way to access the memory. I could
be wrong though...

It means that, within the scope of ptr, any object accessed via a pointer based on ptr is not accessed via a pointer not based on ptr.

-Hal

Hi all,

I think it is time to start getting more concrete here. As a starting
point, I want to send out for review (roughly) the following changes:

  - Add a "gc" address space to the datalayout string
  - Start implementing the non-controversial rules (i.e. everything
    except the bits that initiated the "nospeculate" attribute
    discussion):
     - No pointer <-> integer casts for GC address spaces to begin with
     - Add an intrinsic (with control dependence) to
       convert GCrefs -> integers (we need this for GC load/store
       barriers)
     - Disable some of the problematic "cast by round tripping through
       memory" type optimizations for loads and stores that are of GC
       ref type

The things above are things we know we need, and even if all we do is
implement those, we will be in a better position overall.

One thing I want a design opinion on (already discussed on IRC): I'm
planning to phrase RewriteStatepointsForGC (a ModulePass) that
"implements" GC references "in terms of" normal pointers. One way to
do this is to rewrite each def and user of GC refs to use a normal
pointer, but that's unnecessary data structure churn, so I was
wondering if instead we can flip the meaning of what a GC ref is by
modifying the datalayout instead? RewriteStatepointsForGC can then be
seen as changing IR that can be lowered to run on only a "machine"
that directly supports GC pointers to IR that can be lowered to run on
machines that don't. That is RewriteStatepointsForGC will change IR
from

"No explicit relocations, addrspace(k) is marked as 'gc' in the
datalayout" to "All relocations explicit, addrspace(k) is not marked
specially in the datalayout"

However, Chandler had some (strong?) reservations on IRC about
modifying datalayout in an optimization, in the face of which I have a
couple of alternatives:

  - Have RewriteStatepointsForGC rewrite defs and users of GC
    references to use a "normal" pointer type. I'm a little hesitant
    to to do this since it seems wasteful (no evidence yet that it will
    matter), and may complicate keeping side data structures correct in
    the face of mass invalidations.

  - Represent the gc address space in something other than the
    datalayout that we all can agree is fair game to be modified by a
    ModulePass. Not a great option since datalayout seems the most
    natural place to put the "gc-ref-addrspace" information.

  - Don't do anything, i.e. RewriteStatepointsForGC does what it does
    today: it rewrites pointers of addrspace(1) (or addrspace(k) for
    some k) to be explicit but does not change the meaning of
    "addrspace(k)". I'm hesitant to do this because then I can't
    concisely answer "what does RewriteStatepointsForGC do?".

I want to see what others think about this, but in the absence of any
specific opinion here I'll go with the first option (and consider
using mutateType if things turn out to be too slow).

In parallel with all this, I'll try to come up with a concrete notion
of how the nospeculate attributes on loads and function calls will
look like, how it would interact with optimizations like mem2reg etc.
I'll consider potential interactions with
https://reviews.llvm.org/D20116 "Add speculatable function attribute"
and generally just kick it around to see if the idea holds up and
gives us all of the constraints we need.

Sounds good?
-- Sanjoy

+1

Sounds like a good plan to me

Sorry, I missed this at first but I have one issue here:

Hi all,

I think it is time to start getting more concrete here. As a starting
point, I want to send out for review (roughly) the following changes:

  • Add a “gc” address space to the datalayout string

I don’t really understand the need for this yet, because the following point:

  • Start implementing the non-controversial rules (i.e. everything
    except the bits that initiated the “nospeculate” attribute
    discussion):

I think everything here should apply to all non-zero address spaces. I think the thing we would want is for a tagged address space to opt out of this conservative behavior, not the other way around.

So I don’t think you need a tagged address space to implement everything here, and I’d like to avoid tagging the address space until the last possible second to make sure that this is implemented as generically as possible. I’m actually hopeful that the tagging isn’t necessary at all.

-Chandler

Hi Chandler,

Sorry, I missed this at first but I have one issue here:

I think everything here should apply to *all* non-zero address spaces. I
think the thing we would want is for a tagged address space to opt *out* of
this conservative behavior, not the other way around.

It isn't just conservative behavior though, I'm disallowing `inttoptr`
and `ptrtoint` at the type system level. If we flip the default then
this change will then no longer be backwards compatible, and will
break out of tree targets.

I can make the type system changes last (and lead with the other
subtler fixes), but I do want to make them at some point. This one of
the things we disallow in our tree downstream, and we'd like to
upstream the logic.

So I don't think you need a tagged address space to implement everything
here, and I'd like to avoid tagging the address space until the last
possible second to make sure that this is implemented as generically as
possible. I'm actually hopeful that the tagging isn't necessary at all.

Won't that be stalled on every user of non-zero address spaces
agreeing on these semantics (e.g. on that they don't need `inttoptr`
for instance)? No one has so far stepped forth to discuss the
semantics they need -- though this could have to do with the subject
line though.

I can call the tag something more generic than "GC" if you wish (in
fact, that was the reason for keeping the first patch trivial, to hash
these things out :slight_smile: ), but it seems friendlier to start with
explicitly enumerating the address space(s?) that need the
no-integer-conversion properties, and flip the default once the
changes have circulated more widely and people have had a chance to
experiment.

-- Sanjoy

Hi Chandler,

Sorry, I missed this at first but I have one issue here:

I think everything here should apply to all non-zero address spaces. I
think the thing we would want is for a tagged address space to opt out of
this conservative behavior, not the other way around.

It isn’t just conservative behavior though, I’m disallowing inttoptr
and ptrtoint at the type system level. If we flip the default then
this change will then no longer be backwards compatible, and will
break out of tree targets.

I understand that we need some way of handling inttoptr stuff, but I can see good ways of doing this even with an “opt in” strategy using auto-upgrade.

We could auto-upgrade code to addrspace cast to address space 0 and then do ptrtoint (and vice versa).

But I’m all in favor of making this change, just arguing that the semantics should be the other way around.

So I don’t think you need a tagged address space to implement everything
here, and I’d like to avoid tagging the address space until the last
possible second to make sure that this is implemented as generically as
possible. I’m actually hopeful that the tagging isn’t necessary at all.

Won’t that be stalled on every user of non-zero address spaces
agreeing on these semantics (e.g. on that they don’t need inttoptr
for instance)? No one has so far stepped forth to discuss the
semantics they need – though this could have to do with the subject
line though.

I can call the tag something more generic than “GC” if you wish (in
fact, that was the reason for keeping the first patch trivial, to hash
these things out :slight_smile: ), but it seems friendlier to start with
explicitly enumerating the address space(s?) that need the
no-integer-conversion properties, and flip the default once the
changes have circulated more widely and people have had a chance to
experiment.

Given how few users of non-zero address space pointers there are I actually am not sure about this. Maybe you do need a separate thread to make it clear that we’re considering changing the defaults for address spaces, but I personally much prefer the constructively correct approach where once you are outside of address space zero, there is no integer mapping available unless people need one and opt into having that behavior.

There are relatively few such users though, so i feel like you could likely start a fresh thread and collect them pretty quickly if this in practice isn’t the right tradeoff.

Either way the defaults go, I’d also prefer that the description be about the semantics not about the use case. So I’d suggest a flag indicating an address space has an integer mapping (or that it doesn’t have such a mapping) rather than one that talks about GC.

-Chandler

Hi Chandler,

Chandler Carruth wrote:
>
> I understand that we need some way of handling inttoptr stuff, but I can
> see good ways of doing this even with an "opt in" strategy using
> auto-upgrade.
>
> We could auto-upgrade code to addrspace cast to address space 0 and then
> do ptrtoint (and vice versa).
>
> But I'm all in favor of making this change, just arguing that the
> semantics should be the other way around.

In the current patch it may be difficult to come up with a safe
fallback that works for everyone. For instance, addrspacecast to
addrspace(0) will not work for address spaces that have pointers
larger than addrspace(0) pointers.

*However*, one possible migration path will become possible once we
add the "@llvm.ptrtoint" intrinsic, if we also add an "@llvm.inttoptr"
intrinsic. Once we have those two, we should be able to auto-upgrade
`inttoptr` and `ptrtoint` instructions to calls to these intrinsics;
assuming we can figure out some way to deal with ConstantExprs. I
think expanding ConstantExprs that use (now illegal) `ptrtoint` or
`inttoptr` instructions into instructions in each use site will work,
but I have to think about it a bit more.

> > So I don't think you need a tagged address space to implement
> everything
> > here, and I'd like to avoid tagging the address space until the last
> > possible second to make sure that this is implemented as
> generically as
> > possible. I'm actually hopeful that the tagging isn't necessary
> at all.
>
> Won't that be stalled on every user of non-zero address spaces
> agreeing on these semantics (e.g. on that they don't need `inttoptr`
> for instance)? No one has so far stepped forth to discuss the
> semantics they need -- though this could have to do with the subject
> line though.
>
> I can call the tag something more generic than "GC" if you wish (in
> fact, that was the reason for keeping the first patch trivial, to hash
> these things out :slight_smile: ), but it seems friendlier to start with
> explicitly enumerating the address space(s?) that need the
> no-integer-conversion properties, and flip the default once the
> changes have circulated more widely and people have had a chance to
> experiment.
>
> Given how few users of non-zero address space pointers there are I
> actually am not sure about this. Maybe you do need a separate thread to
> make it clear that we're considering changing the defaults for address
> spaces, but I personally much prefer the constructively correct approach
> where once you are outside of address space zero, there *is* no integer
> mapping available unless people need one and opt into having that behavior.
>
> There are relatively few such users though, so i feel like you could
> likely start a fresh thread and collect them pretty quickly if this in
> practice isn't the right tradeoff.
>
> Either way the defaults go, I'd also prefer that the description be
> about the semantics not about the use case. So I'd suggest a flag
> indicating an address space has an integer mapping (or that it doesn't
> have such a mapping) rather than one that talks about GC.

I've s/GC reference/non-integral pointer/ in the review.

Flipping the default is may be the right thing to do eventually, but I
think doing that now as a *first step* makes the current change
unnecessarily invasive. I think the right time to talk about flipping
the default is once all the key pieces are in place, and we have
reason to believe that the new code is not completely, utterly broken
:).

How about this plan:

  - I'll start (yet!) another RFC on on llvm-dev specifically asking
    about the use cases people have for non-0 address spaces. I'll use
    a non-GC specific title and explicitly CC some people I know use
    non-0 address spaces.

  - I'll try to push for getting all of the "no integer <-> pointer
    conversions" changes checked in one by one, all still predicated on
    an "opt-in" non-integral address space.

  - Once we are reasonably confident that we have stands up, depending
    on what we discover from the RFC mentioned in the first step we
    will flip the default (or not). As discussed in the beginning of
    the email, a forward migration path should not be too difficult as
    long as we have both `@llvm.inttoptr` and `@llvm.ptrtoint`.

-- Sanjoy

Joining in very late, but the tangent here has been interesting (if rather OT for the original thread).

I agree with Danny that we might want to take a close look at how we model things like maythrow calls, no return, and other implicit control flow. I’m not convinced that moving to a pure explicit model is the right idea because we get a lot of gain in practice from being able to reason about what are essentially a limited form of extended basic blocks. I would welcome a design discussion about this, preferably in person, but also don’t want to block any current (or future honestly) work on this until we have a reasonable firm plan of action.

One idea would be to explicitly acknowledge that our “basic blocks” are actually “extended basic blocks” with internal exits due to exception propagation, noreturn, and (recently) guards. I could see a couple of implementation strategies here:

  1. Extend BasicBlock to explicitly track potential early exiting instructions. This would probably have to be conservative so that things like nothrow inference aren’t required to update all callers in one go.
  2. Conservative assume that BasicBlock has an unknown number of early exiting instructions and write an analysis pass which answers questions about where those early exits are. Any pass which does code motion would require this analysis. (This is essentially the principled version of our current hacks.)

Philip

Joining in very late, but the tangent here has been interesting (if rather OT for the original thread).

I agree with Danny that we might want to take a close look at how we model things like maythrow calls, no return, and other implicit control flow. I’m not convinced that moving to a pure explicit model is the right idea because we get a lot of gain in practice from being able to reason about what are essentially a limited form of extended basic blocks. I would welcome a design discussion about this, preferably in person, but also don’t want to block any current (or future honestly) work on this until we have a reasonable firm plan of action.

One idea would be to explicitly acknowledge that our “basic blocks” are actually “extended basic blocks” with internal exits due to exception propagation, noreturn, and (recently) guards. I could see a couple of implementation strategies here:

  1. Extend BasicBlock to explicitly track potential early exiting instructions. This would probably have to be conservative so that things like nothrow inference aren’t required to update all callers in one go.
  2. Conservative assume that BasicBlock has an unknown number of early exiting instructions and write an analysis pass which answers questions about where those early exits are. Any pass which does code motion would require this analysis. (This is essentially the principled version of our current hacks.)

This analysis can be lazy/incremental. Most passes only do “safe” speculation and code sinking without side effects. They don’t need to run the analysis.

Thanks Philip. I forgot to mention that LLVM passes historically have taken advantage of implicit extended basic blocks (ISel) and were never improved to handle general EBBs. It took years for loop opts to finally handle early exits well.
-Andy

Joining in very late, but the tangent here has been interesting (if rather
OT for the original thread).

I agree with Danny that we might want to take a close look at how we model
things like maythrow calls, no return, and other implicit control flow.
I'm not convinced that moving to a pure explicit model is the right idea
because we get a lot of gain in practice from being able to reason about
what are essentially a limited form of extended basic blocks. I would
welcome a design discussion about this, preferably in person, but also
don't want to block any current (or future honestly) work on this until we
have a reasonable firm plan of action.

One idea would be to explicitly acknowledge that our "basic blocks" are
actually "extended basic blocks" with internal exits due to exception
propagation, noreturn, and (recently) guards. I could see a couple of
implementation strategies here:
1) Extend BasicBlock to explicitly track potential early exiting
instructions. This would probably have to be conservative so that things
like nothrow inference aren't required to update all callers in one go.
2) Conservative assume that BasicBlock has an unknown number of early
exiting instructions and write an analysis pass which answers questions
about where those early exits are. Any pass which does code motion would
require this analysis. (This is essentially the principled version of our
current hacks.)

This analysis can be lazy/incremental. Most passes only do “safe”
speculation and code sinking without side effects.

While I agree it can be lazy, and should be an analysis, i'm, again, really
not sure which passes you are thinking about here that do code
sinking/speculation that won't need it.

Here's the list definitely needing it right now:
GVN
GVNHoist
LICM
LoadCombine
LoopReroll
LoopUnswitch
LoopVersioningLICM
MemCpyOptimizer
MergedLoadStoreMotion
Sink

The list is almost certainly larger than this, this was a pretty trivial
grep and examination.
(and doesn't take into account bugs, etc)

It would be nice to know which passes, specifically, you are thinking of
when you say "most" :slight_smile:

I don’t have a point to argue, just clarifying Philip’s language. But if you want to know what I was thinking it’s that there are a couple of passes doing deliberate code motion: GVN and LICM (I didn’t think of the others) and a bunch of passes doing incidental code motion like InstCombine, CodeGenPrepare, a handful of passes calling SCEVExpander, etc.

I should have said “most passes at most do safe speculation…” so you don’t ask me to grep through the code looking for incidental code motion :slight_smile:

-Andy