RFC: Strong GC References in LLVM

Hi Chandler,

Chandler Carruth wrote:
>
> Hi all,
>
> It looks like the key controversial point is the bit about the extra
> control dependence on loads and stores[0]. Generally the consensus is
> that (please chime if you think otherwise) it is not reasonable to
> make the safety (or semantics) of a load instruction depend on the
> type it is loading. Introducing such a thing would involve changing
> the IR semantics in a fundamental way, and therefore has a high bar
> for acceptance.
>
> Here is a summary of the alternative solutions that were proposed here
> and on IRC (thanks Chandler, Andy, Eli!):
>
> Thanks for writing up the summary, sorry I didn't summarize #2 for you
> as I had promised. ;]

No problem; thanks for your help on IRC. :slight_smile:

> 2. Introduce a flag on load and stores that either
> a. Denotes a "gc_safety" control dependence.
> b. Denotes a "blackbox_safety" control dependence. In this case
> we will probably have some optional metadata on loads and
> stores to indicate that the control dependence is actually on
> GC safety.
>
> Suggested name for 2b: "unsafe" or "predicated". Essentially, that the

SGTM.

> load is not generally safe to do even if the *memory* is generally safe
> to load from because of some external effect.

Yes, that's exactly my intent.

> As a starting point, LLVM will conservatively not speculate such
> loads and stores; and will leave open the potential to upstream
> logic that will have a more precise sense of when these loads and
> stores are safe to speculate.
>
> 3. Introduce a general way to denote control dependence on loads and
> stores. This can be helpful to LLVM in general, and will let us
> basically implement a more precise version of (2).
>
> I actually think there is a pretty clean incremental path from 2a to 3

Reading the rest of what you wrote, it looks like you meant "from 2b
to 3"?

> here. We can start with just the basic "there is something control
> dependent about this" and use metadata to experiment with communicating
> it to the optimizer. If we come up with something really general and
> precise, we can fold it into the blanket thing, but until then we don't
> scope creep or put something into the IR.

SGTM.

> One thing you didn't mention is that this requires a matching function
> attribute too so that we can distinguish between a call site that reads
> memory and a call site that reads memory in a way that is potentially
> unsafe.

Yes, thanks for bringing this up.

One interesting possibility is to not make this function attribute
specific to memory at all. Then we can use it to communicate "this
function may have undefined behavior", and fearlessly hoist "readnone
nounwind" functions that are not also "unsafe".

-- Sanjoy

Hi Chandler,

Chandler Carruth wrote:

I actually think there is a pretty clean incremental path from 2a to 3

Reading the rest of what you wrote, it looks like you meant “from 2b
to 3”?

Doh, yes. Sorry.

One thing you didn’t mention is that this requires a matching function
attribute too so that we can distinguish between a call site that reads
memory and a call site that reads memory in a way that is potentially
unsafe.

Yes, thanks for bringing this up.

One interesting possibility is to not make this function attribute
specific to memory at all. Then we can use it to communicate “this
function may have undefined behavior”, and fearlessly hoist “readnone
nounwind” functions that are not also “unsafe”.

This is … an extremely interesting idea.

Can you start a separate RFC about this?

I think it is important to understand how these things interact with each other, but it sounds like they may form a very cohesive whole.

Hi all,

It looks like the key controversial point is the bit about the extra
control dependence on loads and stores[0]. Generally the consensus is
that (please chime if you think otherwise) it is not reasonable to
make the safety (or semantics) of a load instruction depend on the
type it is loading. Introducing such a thing would involve changing
the IR semantics in a fundamental way, and therefore has a high bar
for acceptance.

Here is a summary of the alternative solutions that were proposed here
and on IRC (thanks Chandler, Andy, Eli!):

1. Model loads and stores of GC references as intrinsic calls: add
    llvm.gc_load, llvm.gc_store intrinsics, and optimize them as loads
    and stores whenever appropriate and legal.

2. Introduce a flag on load and stores that either
      a. Denotes a "gc_safety" control dependence.
      b. Denotes a "blackbox_safety" control dependence. In this case
         we will probably have some optional metadata on loads and
         stores to indicate that the control dependence is actually on
         GC safety.

    As a starting point, LLVM will conservatively not speculate such
    loads and stores; and will leave open the potential to upstream
    logic that will have a more precise sense of when these loads and
    stores are safe to speculate.

I think you need to define what you mean by control dependence here. If
you mean speculation, you should say speculation :slight_smile:
As you describe below, it is not enough to simply not speculate them. You
also are saying you don't want to change the conditions on which they
execute.
That is very different from speculation.

3. Introduce a general way to denote control dependence on loads and
    stores. This can be helpful to LLVM in general, and will let us
    basically implement a more precise version of (2).

[0]: I may have (incorrectly) mentioned otherwise on IRC, but we need
to model safety properties of stores as well, to avoid transforms
like:

  %x = malloc() ;; known thread local
  if (cond_0) {
    store GCREF %val to %x
  }
  if (cond_1) {
    store i64 %val to %x
  }

to

  %x = malloc() ;; known thread local
  if (cond_0 || cond_1) {
    store GCREF %val to %x ;; This "speculative" store is bad
    if (cond_1) {
      store i64 %val to %x
    }
  }

FWIW: This raises one of the same issues we have now with may-throw, which
is that, if all you have is a flag on the instruction, now you have to look
at every instruction in every block to know whether a *CFG* transform is
correct.

That means any pass that wants to just touch the CFG can't do so without
also looking at the instruction stream. It will also make a bunch of
things currently O(N), O(N^2) (see the sets of patches fixing may-throw
places, and extrapolate to more places).

This is theoretically fixable in each pass by taking a single pass over the
instruction stream and marking which blocks have these instructions, etc,
and then using that info.

But we shouldn't have to do that in each pass, especially if they just
want to make CFG manipulations.

The TL;DR I would really like to see this also made a BB level flag that
says whether the block contains instructions with unknown control
dependences (or really, 1 bb level flag for each attribute you introduce) .
I don't think this is terribly difficult, and at the very least, will keep
us from having to look at every instruction in the block in the common case
that there is nothing in the block to worry about :wink:

Also note that these CFG transforms will also now need post-dominance info
in a bunch of cases to sanely determine if they are changing the control
dependence structure.

Let me ask another question:

Given

  %x = malloc() ;; known thread local
  if (cond_0) {
    store GCREF %val to %x
  }
  if (cond_1) {
    store i64 %val to %x
  }
Assume i can prove cond0 and cond1 the same.

I change this to:
  %x = malloc() ;; known thread local
  if (cond_0) {
    store i64 %val to %x
    store GCREF %val to %x
  }

Is this okay to happen?
Note that i did not actually change the control dependence of it by the
normal definition of control dependence.

But you can end up "hoisting" or "sinking" instructions by moving every
other instruction :slight_smile:

Is that okay, or are they really barriers as well (like may throw), in
which case, they probably need some real representation in control flow if
you want to make most algorithms O(N) (you can get away with the bb level
flags if you are willing to make some algorithms N^2 in cases where these
things exist).

Hi Daniel,

Daniel Berlin wrote:
> As a starting point, LLVM will conservatively not speculate such
> loads and stores; and will leave open the potential to upstream
> logic that will have a more precise sense of when these loads and
> stores are safe to speculate.
>
> I think you need to define what you mean by control dependence here. If
> you mean speculation, you should say speculation :slight_smile:

Apologies for being non-specific -- this is really just "don't
speculate".

> As you describe below, it is not enough to simply not speculate them.

I'm not sure where I said that?

> You also are saying you don't want to change the conditions on which
> they execute.
> That is very different from speculation.

If I implied that somehow then I (or the example) was wrong. :slight_smile:

We can't speculate these instructions (without special knowledge of
the GC and the Java type system), and that's it.

> FWIW: This raises one of the same issues we have now with may-throw,
> which is that, if all you have is a flag on the instruction, now you
> have to look at every instruction in every block to know whether a *CFG*
> transform is correct.
>
> That means any pass that wants to just touch the CFG can't do so without
> also looking at the instruction stream. It will also make a bunch of
> things currently O(N), O(N^2) (see the sets of patches fixing may-throw
> places, and extrapolate to more places).

As I said, I'm only proposing a "don't speculate" flag, so this does
not (?) apply.

However, I didn't quite understand your point about may-throw -- how
is may-throw different from a generic side-effect (volatile store,
syscall etc.)? All of those can't be hoisted or sunk -- we have to
make sure that they execute in semantically the same conditions that
they did in the original program.

> This is theoretically fixable in each pass by taking a single pass over
> the instruction stream and marking which blocks have these instructions,
> etc, and then using that info.
>
> But we shouldn't have to do that in each pass, especially if they just
> want to make CFG manipulations.
>
> The TL;DR I would really like to see this also made a BB level flag
> that says whether the block contains instructions with unknown control
> dependences (or really, 1 bb level flag for each attribute you introduce) .
> I don't think this is terribly difficult, and at the very least, will
> keep us from having to look at every instruction in the block in the
> common case that there is nothing in the block to worry about :wink:
>
> Also note that these CFG transforms will also now need post-dominance
> info in a bunch of cases to sanely determine if they are changing the
> control dependence structure.
>
> Let me ask another question:
>
> Given
>
> %x = malloc() ;; known thread local
> if (cond_0) {
> store GCREF %val to %x
> }
> if (cond_1) {
> store i64 %val to %x
> }
> Assume i can prove cond0 and cond1 the same.
>
> I change this to:
> %x = malloc() ;; known thread local
> if (cond_0) {
> store i64 %val to %x
> store GCREF %val to %x
> }
>
> Is this okay to happen?

Yes. The only restriction is you can't issue a GCREF load or store
that wouldn't have been issued in the original program (even if the
location is thread local, in case of stores).

-- Sanjoy

Hi Daniel,

Daniel Berlin wrote:
> As a starting point, LLVM will conservatively not speculate such
> loads and stores; and will leave open the potential to upstream
> logic that will have a more precise sense of when these loads
and
> stores are safe to speculate.
>
>
> I think you need to define what you mean by control dependence here. If
> you mean speculation, you should say speculation :slight_smile:

Apologies for being non-specific -- this is really just "don't
speculate".

> As you describe below, it is not enough to simply not speculate them.

I'm not sure where I said that?

> You also are saying you don't want to change the conditions on which
> they execute.
> That is very different from speculation.

If I implied that somehow then I (or the example) was wrong. :slight_smile:

:slight_smile:

We can't speculate these instructions (without special knowledge of
the GC and the Java type system), and that's it.

Okey.

> FWIW: This raises one of the same issues we have now with may-throw,
> which is that, if all you have is a flag on the instruction, now you
> have to look at every instruction in every block to know whether a *CFG*
> transform is correct.
>
> That means any pass that wants to just touch the CFG can't do so without
> also looking at the instruction stream. It will also make a bunch of
> things currently O(N), O(N^2) (see the sets of patches fixing may-throw
> places, and extrapolate to more places).

As I said, I'm only proposing a "don't speculate" flag, so this does
not (?) apply.

As long as it applies only to the instructions, and they do not act as
"barriers" to hoisting/sinking, then yes, it should not apply.

(In theory it still means things have to look at instructions, but they had
to look at them anyway at that point :P)

However, I didn't quite understand your point about may-throw -- how
is may-throw different from a generic side-effect (volatile store,
syscall etc.)? All of those can't be hoisted or sunk -- we have to
make sure that they execute in semantically the same conditions that
they did in the original program.

may-throw is, AFAIK, worse. They act as barriers to sinking *other

things*. You cannot sink a store past a may-throw, or hoist a load above
them. You can't optimize stores across them either:

See:
[PATCH] D21007: DSE: Don't remove stores made live by a call which unwinds.
for the latter

[llvm] r270828 - [MergedLoadStoreMotion] Don't transform across may-throw
calls
for the former.

"It is unsafe to hoist a load before a function call which may throw, the
throw might prevent a pointer dereference.

Likewise, it is unsafe to sink a store after a call which may throw.
The caller might be able to observe the difference."

This then leads to the problem i mentioned - because the may-throwness is
not expressed at the bb level (or in the CFG, by having the call end the
block, or at the least, a fake abnormal CFG edge), everything has to go
checking every instruction along the entire path they want to hoist,
whereas hoisting is normally just a simple dataflow problem with BB level
properties :slight_smile:

This then leads to the problem i mentioned - because the may-throwness is

not expressed at the bb level (or in the CFG, by having the call end the
block, or at the least, a fake abnormal CFG edge), everything has to go
checking every instruction along the entire path they want to hoist,
whereas hoisting is normally just a simple dataflow problem with BB level
properties :slight_smile:

and to be clear, i'm just being colloquial about "expressed at the bb
level". An analysis that everything used/kept up to date if it decided to
insert throwing calls or make calls nounwind would have the same effect. I
happen to be more used to these things being flags on basic blocks, but
whatever works.

Hi Daniel,

Daniel Berlin wrote:
> However, I didn't quite understand your point about may-throw -- how
> is may-throw different from a generic side-effect (volatile store,
> syscall etc.)? All of those can't be hoisted or sunk -- we have to
> make sure that they execute in semantically the same conditions that
> they did in the original program.
>
> may-throw is, AFAIK, worse. They act as barriers to sinking *other
> things*. You cannot sink a store past a may-throw, or hoist a load above
> them. You can't optimize stores across them either:

Don't we have the same problems for "exit(0)" and "while(true) {
*volatile_ptr = 42; }" too? Both of these are optimization barriers
while still being "nounwind" (i.e. could be legitimately contained in
a nounwind function); though not in exactly the same way as a
may-throw call (e.g. you can DSE across exit(0) and you can sink
non-atomic loads past "while(true) {...}").

-- Sanjoy

> See:
> [PATCH] D21007: DSE: Don't remove stores made live by a call which unwinds.

Hi Daniel,

Daniel Berlin wrote:
> However, I didn't quite understand your point about may-throw -- how
> is may-throw different from a generic side-effect (volatile store,
> syscall etc.)? All of those can't be hoisted or sunk -- we have to
> make sure that they execute in semantically the same conditions that
> they did in the original program.
>
> may-throw is, AFAIK, worse. They act as barriers to sinking *other
> things*. You cannot sink a store past a may-throw, or hoist a load above
> them. You can't optimize stores across them either:

Don't we have the same problems for "exit(0)"

This is a noreturn call, so yes, iit has another hidden control
flow-side-effect of a slightly different kind. GCC models it as an extra
fake edge from the BB containing a noreturn call to the exit block of the
function, so that nothing sinks below it by accident.

I do not believe we do anything special here, so yes, it also has the same
general issue as may-throw.

and "while(true) {
*volatile_ptr = 42; }" too?

I can move non-volatile stores past volatile stores :slight_smile:

Or did you mean something else?

  Both of these are optimization barriers
while still being "nounwind" (i.e. could be legitimately contained in
a nounwind function); though not in exactly the same way as a
may-throw call (e.g. you can DSE across exit(0) and you can sink
non-atomic loads past "while(true) {...}").

I do not claim there are not other instances. Noreturn is in fact, a good
exampl). But i would also bet they are just as buggy as may-throw was for
the same reason, and they would cause the same N^2ness.

Essentially, anything that has produces hidden control flow (instead of
just depending on hidden control flow) will have this issue.
The also are things that any flag/analysis should be able to flag.

Hi Daniel,

Daniel Berlin wrote:
> Don't we have the same problems for "exit(0)"
>
> This is a noreturn call, so yes, iit has another hidden control
> flow-side-effect of a slightly different kind. GCC models it as an extra
> fake edge from the BB containing a noreturn call to the exit block of
> the function, so that nothing sinks below it by accident.

Just to be clear, it'd have to keep that sort of edge for all call
sites, unless it can prove that the call target does not call exit?

> I do not believe we do anything special here, so yes, it also has the
> same general issue as may-throw.
>
> and "while(true) {
> *volatile_ptr = 42; }" too?
>
> I can move non-volatile stores past volatile stores :slight_smile:

I meant:

// ptr_a and ptr_b are NoAlias, ptr_a holds 0 to begin with.

ThreadA:
   while(true) { store volatile i32 42, i32* %ptr_b }
   store atomic i32 42, i32* %ptr_a

ThreadB:
   %val = load atomic i32, i32* %ptr_a
   assert(%val is not 42) // The store is "guarded" by an inf loop

We can't reorder the store to ptr_a to before the infinite loop. The
volatile store is there to make the infinite loop well defined.

> Or did you mean something else?
>
> Both of these are optimization barriers
> while still being "nounwind" (i.e. could be legitimately contained in
> a nounwind function); though not in exactly the same way as a
> may-throw call (e.g. you can DSE across exit(0) and you can sink
> non-atomic loads past "while(true) {...}").
>
> I do not claim there are not other instances. Noreturn is in fact, a
> good exampl). But i would also bet they are just as buggy as may-throw
> was for the same reason, and they would cause the same N^2ness.

Yes.

> Essentially, anything that has produces hidden control flow (instead of
> just depending on hidden control flow) will have this issue.
> The also are things that any flag/analysis should be able to flag.

Yup.

-- Sanjoy

Hi Daniel,

Daniel Berlin wrote:
> Don't we have the same problems for "exit(0)"
>
>
> This is a noreturn call, so yes, iit has another hidden control
> flow-side-effect of a slightly different kind. GCC models it as an extra
> fake edge from the BB containing a noreturn call to the exit block of
> the function, so that nothing sinks below it by accident.

Just to be clear, it'd have to keep that sort of edge for all call
sites, unless it can prove that the call target does not call exit?

Yes.

/* Add fake edges to the function exit for any non constant and non
    noreturn calls (or noreturn calls with EH/abnormal edges),
    volatile inline assembly in the bitmap of blocks specified by BLOCKS
    or to the whole CFG if BLOCKS is zero.
...

    The goal is to expose cases in which entering a basic block does
    not imply that all subsequent instructions must be executed. */

// ptr_a and ptr_b are NoAlias, ptr_a holds 0 to begin with.

ThreadA:
  while(true) { store volatile i32 42, i32* %ptr_b }
  store atomic i32 42, i32* %ptr_a

ThreadB:
  %val = load atomic i32, i32* %ptr_a
  assert(%val is not 42) // The store is "guarded" by an inf loop

We can't reorder the store to ptr_a to before the infinite loop. The
volatile store is there to make the infinite loop well defined.

These do not have hidden control flow. It is actually well defined it just
literallly involves other instructions :slight_smile:

Note that gcc will optionally connect the infinite loop itself to the exit
block with a fake edge if you want
(you can add/remove fake edges on a per-opt basis).

Hi Daniel,

Daniel Berlin wrote:
> Don't we have the same problems for "exit(0)"
>
>
> This is a noreturn call, so yes, iit has another hidden control
> flow-side-effect of a slightly different kind. GCC models it as an extra
> fake edge from the BB containing a noreturn call to the exit block of
> the function, so that nothing sinks below it by accident.

Just to be clear, it'd have to keep that sort of edge for all call
sites, unless it can prove that the call target does not call exit?

Yes.

/* Add fake edges to the function exit for any non constant and non
    noreturn calls (or noreturn calls with EH/abnormal edges),
    volatile inline assembly in the bitmap of blocks specified by BLOCKS
    or to the whole CFG if BLOCKS is zero.
...

    The goal is to expose cases in which entering a basic block does
    not imply that all subsequent instructions must be executed. */

Note that this is also necessary to makes post-dominance correct (but we
already do it in most cases, but i think there are still bugs open about
correctness)

For dominance, the dominance relationship for exit blocks a noreturn blocks
reach also changes , though i don't honestly remember if it's material or
not, and i'm a bit lazy to think about it. But here's an example:

IE given

A (may-throw)

v
B

v
C (exit)

Here, we have A dominates B dominates C

So the dominator tree is
A

v
B

v
C

Now, if you add an edge from A to C, you have:

A dominates B
Neither B nor A dominate C (C's idom is somewhere above, so it's in a
sibling tree somewhere).

IE

A C
>
B

In GCC, there is a single exit block, and it is always empty (returns are
connected to the exit block).
Thus, the above will not prevent an optimization that should otherwise
happen, from happening.

Because LLVM's exit blocks contain real code, it may

You can actually get even worse (ie really wrong) situations if the "exit"
blocks somehow have successors, but thankfully we don't have a case where
that happens that i know :slight_smile:

Hi Daniel,

Daniel Berlin wrote:
> // ptr_a and ptr_b are NoAlias, ptr_a holds 0 to begin with.
>
> ThreadA:
> while(true) { store volatile i32 42, i32* %ptr_b }
> store atomic i32 42, i32* %ptr_a
>
> ThreadB:
> %val = load atomic i32, i32* %ptr_a
> assert(%val is not 42) // The store is "guarded" by an inf loop
>
> We can't reorder the store to ptr_a to before the infinite loop. The
> volatile store is there to make the infinite loop well defined.
>
> These do not have hidden control flow. It is actually well defined it
> just literallly involves other instructions :slight_smile:

As written above, sure; but the `while (true)` could be inside a
nounwind function (that does not return). So it is not sufficient to
just look for calls to unknown functions in a function body to
conclude that it is alwaysreturn, it isn't sufficient to look for
llvm::Loop's either since we could have cycles not describable as
loops.

I'm not claiming that ^ is new information, btw, I'm justifying why I
specifically brought up the "while(true)" example. :slight_smile:

> Note that gcc will optionally connect the infinite loop itself to the
> exit block with a fake edge if you want
> (you can add/remove fake edges on a per-opt basis).

Does GCC's notion of a "loop" (unlike LLVM) include all potentially
infinite control flow?

-- Sanjoy

Hi Daniel,

Daniel Berlin wrote:
> /* Add fake edges to the function exit for any non constant and non
> noreturn calls (or noreturn calls with EH/abnormal edges),
> volatile inline assembly in the bitmap of blocks specified by
> BLOCKS
> or to the whole CFG if BLOCKS is zero.
> ...
>
> The goal is to expose cases in which entering a basic block does
> not imply that all subsequent instructions must be executed. */
>
> Note that this is also necessary to makes post-dominance correct (but we
> already do it in most cases, but i think there are still bugs open about
> correctness)

Yeah, right now in LLVM we cannot rely on post-dominance to conclude
"guaranteed to execute" which isn't ideal. There's at least one place
I know where a more precise post-dominance could be used. :slight_smile:

> For dominance, the dominance relationship for exit blocks a noreturn
> blocks reach also changes , though i don't honestly remember if it's
> material or not, and i'm a bit lazy to think about it. But here's an
> example:
>
> IE given
>
> A (may-throw)
> >
> v
> B
> >
> v
> C (exit)
>
> Here, we have A dominates B dominates C
>
> So the dominator tree is
> A
> >
> v
> B
> >
> v
> C
>
> Now, if you add an edge from A to C, you have:
>
> A dominates B
> Neither B nor A dominate C (C's idom is somewhere above, so it's in a
> sibling tree somewhere).

Not sure I understand this example -- won't C's new idom be A? The new
graph is this, right:

A -> C
A -> B
B -> C

?

Hi Daniel,

Daniel Berlin wrote:

/* Add fake edges to the function exit for any non constant and non
noreturn calls (or noreturn calls with EH/abnormal edges),
volatile inline assembly in the bitmap of blocks specified by
BLOCKS
or to the whole CFG if BLOCKS is zero.

The goal is to expose cases in which entering a basic block does
not imply that all subsequent instructions must be executed. */

Note that this is also necessary to makes post-dominance correct (but we
already do it in most cases, but i think there are still bugs open about
correctness)

Yeah, right now in LLVM we cannot rely on post-dominance to conclude
“guaranteed to execute” which isn’t ideal. There’s at least one place
I know where a more precise post-dominance could be used. :slight_smile:

For dominance, the dominance relationship for exit blocks a noreturn
blocks reach also changes , though i don’t honestly remember if it’s
material or not, and i’m a bit lazy to think about it. But here’s an
example:

IE given

A (may-throw)

v
B

v
C (exit)

Here, we have A dominates B dominates C

So the dominator tree is
A

v
B

v
C

Now, if you add an edge from A to C, you have:

A dominates B
Neither B nor A dominate C (C’s idom is somewhere above, so it’s in a
sibling tree somewhere).

Not sure I understand this example – won’t C’s new idom be A? The new
graph is this, right:

A → C
A → B
B → C

?

Yes I originally constructed the graph with two blocks with may throw calls, then simplified it and screwed up. Such is Friday :grinning:
I noticed right after I sent it and left for the day but not before you caught it.

I completely understand the philosophical criticism, but

  • LLVM clearly made the decision/tradeoff to allow implicit early exits, and I’m pretty certain that will never change.

  • LLVM made the decision/tradeoff not to maintain a postdom tree throughout most of the pass pipeline

  • Fundamentally, you don’t really lose anything. It’s an easy analysis to find the exit points and mark blocks. Doing a CFG walk instead of a PostDom walk is typically not such a big deal.

The fundamental problem relevant to Precise GCRef is that the dependence between program conditions and loads can’t be expressed.

I often overload the term “control dependence” here. When I say a load is control dependent on a branch, I don’t mean that the load’s block is classically control dependent on the branch, I mean that the load is illegal to speculate above the branch. Yes they are two different things, but I don’t have a better term to use for that dependence information.

-Andy

> Note that this is also necessary to makes post-dominance correct (but we
> already do it in most cases, but i think there are still bugs open about
> correctness)

Yeah, right now in LLVM we cannot rely on post-dominance to conclude
"guaranteed to execute" which isn't ideal. There's at least one place
I know where a more precise post-dominance could be used. :slight_smile:

I completely understand the philosophical criticism, but

- LLVM clearly made the decision/tradeoff to allow implicit early exits,
and I’m pretty certain that will never change.

Nobody is looking to change that necessarily, or at least i'm not, only to

make the impact not "correctness requires N^2 algorithms or other things
that *already* are slowing down the compiler"

- LLVM made the decision/tradeoff not to maintain a postdom tree
throughout most of the pass pipeline

Well, actually, this one is likely to change. The only issue here is

keeping it updated, and i have a solution for that.

- Fundamentally, you don’t really lose anything.

This i'm going to disagree with :slight_smile:
This is basically an argument that postdom is not needed, and i think we've
actually proven precisely the opposite. We approximate it in so many
places, and then go looking for harder and harder ways to optimize those
cases using dominance, and make ever more complicated "if the cfg kinda
looks like that, well then, sure go for it" instead of using postdom. We
never get all the cases, and we're well past the point where it would be
easier to use postdom in most of these places.

It’s an easy analysis to find the exit points and mark blocks.

I'm going to simply point out that may throw was fixed using N^2 methods in
at least 3 passes, with no intention to change it. I haven't looked at the
rest, but i also suspect there are more bugs hiding here that nobody has
noticed yet.

So you are right that it may be algorithmically easy, it's apparently not
so easy that people did it.
I will also point out that the design decision lead to simple trivial test
cases that exercised this design decision being broken for years in the
compiler in on-by-default optimization passes.

Does that make it the wrong design?
No, of course not, but it is empirical evidence that maybe things were
missing from "making it easy to get it right", etc.

Doing a CFG walk instead of a PostDom walk is typically not such a big
deal.

Using post-dom to approximate classical control dependence, as we do in
several places, has led to a mess nobody quite understands in those places.

But, like i said, i have a plan to fix this by making post-dom updates
automatic. There are now incremental dominance update algorithms that are
very fast in the common case (IE 100x faster than computing from scratch)
and even in the very pathological cases, 2x faster than computing from
scratch. This comes out to "fast enough" to maintain postdom without
anyone having to think about anything other than saying "i added an edge"
or "i deleted an edge".

I often overload the term “control dependence” here. When I say a load is

Doing a CFG walk instead of a PostDom walk is typically not such a big deal.

Using post-dom to approximate classical control dependence, as we do in several places, has led to a mess nobody quite understands in those places.

But, like i said, i have a plan to fix this by making post-dom updates automatic. There are now incremental dominance update algorithms that are very fast in the common case (IE 100x faster than computing from scratch) and even in the very pathological cases, 2x faster than computing from scratch. This comes out to “fast enough” to maintain postdom without anyone having to think about anything other than saying “i added an edge” or "i deleted an edge”.

Ok, I do love incremental domtree update. It’s just that for postdom you have to be aware of addition/removal of maythrow sort of calls.

I often overload the term “control dependence” here. When I say a load is control dependent on a branch, I don’t mean that the load’s block is classically control dependent on the branch, I mean that the load is illegal to speculate above the branch. Yes they are two different things, but I don’t have a better term to use for that dependence information.

I still don’t know how to talk about a load’s dependencies without calling it “control dependence”

-Andy

Though, just as an aside, i will admit i find the design decision strange.

GCC's design decision is one where nothing has to explicitly care about
implicit early exits to get correct answers about sinking/hoisting (they
won't hoist because the predecessor will have multiple successors, they
won't sink because the current block will have multiple successors. Even
PRE and PDE will get this right by default).

LLVM's design decision is one where everything has to explicitly care about
implicit early exits to get correct answers (and not to harp too much, but
"not everything does", years later). If they don't, they will get wrong
answers.

Most of LLVM's design tradeoffs are in the other direction.

Doing a CFG walk instead of a PostDom walk is typically not such a big

deal.

Using post-dom to approximate classical control dependence, as we do in
several places, has led to a mess nobody quite understands in those places.

But, like i said, i have a plan to fix this by making post-dom updates
automatic. There are now incremental dominance update algorithms that are
very fast in the common case (IE 100x faster than computing from scratch)
and even in the very pathological cases, 2x faster than computing from
scratch. This comes out to "fast enough" to maintain postdom without
anyone having to think about anything other than saying "i added an edge"
or "i deleted an edge”.

Ok, I do love incremental domtree update. It’s just that for postdom you
have to be aware of addition/removal of maythrow sort of calls.

These create block edges in the post-dom tree, even now (or at least, they
have to for correctness). They still will be then. So it'll just be an
edge add/removal in the actual tree :wink:

From an API standpoint, the only thing an API would actually care about for

this is the block they were added/removed from, and the count of such calls
before add/removal (If there are any, you need an edge, if there are none,
you don't).

So as long as things changing these types of calls (or erasing/duplicating
them) notify postdom, it'll still work just fine.

That seems doable.

I often overload the term “control dependence” here. When I say a load is

control dependent on a branch, I don’t mean that the load’s block is
classically control dependent on the branch, I mean that the load is
illegal to speculate above the branch. Yes they are two different things,
but I don’t have a better term to use for that dependence information.

I still don’t know how to talk about a load’s dependencies without calling
it “control dependence”

Yeah, i'm working on it :slight_smile:

From: "Andrew Trick" <atrick@apple.com>
To: "Sanjoy Das" <sanjoy@playingwithpointers.com>
Cc: "Daniel Berlin" <dberlin@dberlin.org>, "llvm-dev"
<llvm-dev@lists.llvm.org>, "Joseph Tremoulet"
<jotrem@microsoft.com>, "Oscar Blumberg"
<oscar.blumberg@normalesup.org>, "Chandler Carruth"
<chandlerc@gmail.com>, "Nick Lewycky" <nlewycky@google.com>, "Hal
Finkel" <hfinkel@anl.gov>, "Philip Reames"
<listmail@philipreames.com>, "Manuel Jacob" <me@manueljacob.de>,
"Eli Friedman" <eli.friedman@gmail.com>, "David Majnemer"
<david.majnemer@gmail.com>
Sent: Friday, July 15, 2016 6:00:12 PM
Subject: Re: RFC: Strong GC References in LLVM

> > Note that this is also necessary to makes post-dominance correct
> > (but we

> > already do it in most cases, but i think there are still bugs
> > open
> > about

> > correctness)

> Yeah, right now in LLVM we cannot rely on post-dominance to
> conclude

> "guaranteed to execute" which isn't ideal. There's at least one
> place

> I know where a more precise post-dominance could be used. :slight_smile:

I completely understand the philosophical criticism, but

- LLVM clearly made the decision/tradeoff to allow implicit early
exits, and I’m pretty certain that will never change.

Why? A decision was made to give pointers types, and we've decided to change that. It is not clear to me that the decision to allow implicit early exits was, in retrospect, optimal. I think it is completely healthy for the project to reevaluate these kinds of decisions. We now have many years of experience, bug reports, and we should have a good ability to evaluate the compile-time impact of a potential change.

-Hal