CGSCC passes (in the new PM)

Hey,

Recently I ran into multiple problems with the new PM CGSCC Attributor pass.
Instead of going into details now, I want to first ask what we
want/expect to work when it comes to CGSCC passes (in the new PM).

What I was hoping to (eventually) do are the following things:

- Internalize functions, that is, replace all uses of
`define weak @foo() {}`
with
`define private @foo.internal() {}`
while keeping @foo around.

- Delete functions, both in the current SCC but also children SCCs, e.g.,
```
static void bar() { ... }
static void baz() { bar(); }
void quack() { if (/* something that is basically */ false) baz(); }
```
We visit the singleton SCCs with bar, baz, then quack only to realize
when we optimize quack that baz and bar are dead and can be removed.

- Delete multiple functions from the current SCC, this is already
breaking with our current (inliner-centric) update methods (AFAIKT).

- Add/Remove/Modify globals variables. (Not related to the call graph,
but I wanted to mention it anyway.)

- Outline parts of a function in the current SCC into a new function
that will then either become part of the current SCC (even if that is
an overapproximation) or a child SCC.

- Delete calls to any function.

There might be more but this is all I remember right now. If it turns
out too many things cannot be supported, it's unclear if the module
Attributor pass is not going to be preferable, especially since we can
parallelize the pass itself more easily (I think) than multiple
concurrent CGSCC passes.

I'd appreciate any thoughts on this :slight_smile:

Thanks in advance,
Johannes

I’m missing a lot of the context for the initial implementation of the NPM CGSCC infra so I may be wrong in some aspects, but my initial thoughts are below

Hey,

Recently I ran into multiple problems with the new PM CGSCC Attributor pass.
Instead of going into details now, I want to first ask what we
want/expect to work when it comes to CGSCC passes (in the new PM).

What I was hoping to (eventually) do are the following things:

  • Internalize functions, that is, replace all uses of
    define weak @foo() {}
    with
    define private @foo.internal() {}
    while keeping @foo around.

I can see issues where foo.internal() is created (say when visiting foo()), but the CGSCC pass manager doesn’t visit it, and then later on when a caller of foo() now calls foo.internal(), the CGSCC contract is broken since foo.internal() hasn’t been visited yet.
Pretty sure that modifying all callers of foo() when splitting foo() is bad since you’d be modifying a SCC further up.

  • Delete functions, both in the current SCC but also children SCCs, e.g.,
static void bar() { ... }
static void baz() { bar(); }
void quack() { if (/* something that is basically */ false) baz(); }

We visit the singleton SCCs with bar, baz, then quack only to realize
when we optimize quack that baz and bar are dead and can be removed.

The inliner already deletes non-recursive dead functions, so I think this should already work. (Although looking through the code, the inliner wouldn’t delete bar in your case which seems like an oversight)

  • Delete multiple functions from the current SCC, this is already
    breaking with our current (inliner-centric) update methods (AFAIKT).

To clarify, deleting potentially mutually recursive functions in the current SCC? That seems reasonable, maybe by making the existing LazyCallGraph::removeDeadFunction() more generic.

  • Add/Remove/Modify globals variables. (Not related to the call graph,
    but I wanted to mention it anyway.)

Not sure that modifying globals makes sense in a CGSCC pass. Maybe if the global were only used by functions in the current SCC. Any specific examples?

  • Outline parts of a function in the current SCC into a new function
    that will then either become part of the current SCC (even if that is
    an overapproximation) or a child SCC.

I’ve actually been trying to work on this to make coroutines work under the NPM. For an example see https://reviews.llvm.org/D88714. I’ve been meaning to ask for more specifics on exactly what coroutines do to the call graph to better understand what a proper solution would look like.

  • Delete calls to any function.

There might be more but this is all I remember right now. If it turns
out too many things cannot be supported, it’s unclear if the module
Attributor pass is not going to be preferable, especially since we can
parallelize the pass itself more easily (I think) than multiple
concurrent CGSCC passes.

I have a feeling that concurrency at the pass manager level is a long way off.

I’d appreciate any thoughts on this :slight_smile:

At a high level, does the attributor work on SCCs? It seems that it works on the call graph, but not in a bottom-up way, making it different from typical CGSCC passes. That makes me think it shouldn’t be working with the CGSCC infra. Did you want to run it alongside the inliner/other CGSCC passes?

I'm missing a lot of the context for the initial implementation of

the NPM
> CGSCC infra so I may be wrong in some aspects, but my initial thoughts are
> below

Yeah, I don't know either so I figured I ask :slight_smile:

>
>> Hey,
>>
>> Recently I ran into multiple problems with the new PM CGSCC Attributor
>> pass.
>> Instead of going into details now, I want to first ask what we
>> want/expect to work when it comes to CGSCC passes (in the new PM).
>>
>> What I was hoping to (eventually) do are the following things:
>>
>> - Internalize functions, that is, replace all uses of
>> `define weak @foo() {}`
>> with
>> `define private @foo.internal() {}`
>> while keeping @foo around.
>>
> I can see issues where foo.internal() is created (say when visiting foo()),
> but the CGSCC pass manager doesn't visit it, and then later on when a
> caller of foo() now calls foo.internal(), the CGSCC contract is broken
> since foo.internal() hasn't been visited yet.
> Pretty sure that modifying all callers of foo() when splitting foo() is bad
> since you'd be modifying a SCC further up.

Fair enough, I can do that in the Module pass only I guess.

>> - Delete functions, both in the current SCC but also children SCCs,
>> e.g.,
>> ```
>> static void bar() { ... }
>> static void baz() { bar(); }
>> void quack() { if (/* something that is basically */ false) baz(); }
>> ```
>> We visit the singleton SCCs with bar, baz, then quack only to realize
>> when we optimize quack that baz and bar are dead and can be removed.
>>
> The inliner already deletes non-recursive dead functions, so I think this
> should already work. (Although looking through the code, the inliner
> wouldn't delete bar in your case which seems like an oversight)

This "oversight" is consequential though, see the next comment.

>
>>
>> - Delete multiple functions from the current SCC, this is already
>> breaking with our current (inliner-centric) update methods (AFAIKT).
>>
> To clarify, deleting potentially mutually recursive functions in the
> current SCC? That seems reasonable, maybe by making the existing
> LazyCallGraph::removeDeadFunction() more generic.

Yes, but:

[This is from my memory so it is a bit fuzzy but I think we can
reconstruct the situation if we want to.]

The methods
updateCGAndAnalysisManagerForCGSCCPass
and
updateCGAndAnalysisManagerForFunctionPass
can, as I understand it, split a SCC if you delete a function from it.
They use UpdatedRC in CGSCCUpdateResult to later repair the CG on the
"way up".

Now if you have a single SCC with 3 functions [bar,baz,quack] of which 2
are dead:

     static void bar() { quack(); }
     static void baz() { bar(); }
     void quack() { if (/* something that is basically */ false) baz(); }

You can delete baz and get two SCCs each with one function [bar] and
[quack]. Now if you continue deleting, from what I have seen, we break
some invariants when we delete bar out of [bar]. I think the problem is
that we only have one UpdatedRC in the CGSCCUpdateResult which is later
used to repair things.

IIRC, UpdatedRC for the oroginal [bar,baz,quack] SCC can point to the
[bar] SCC which is deleted though. That is not supposed to happen, and
outside the Attributor I don't think can happen.

>>
>> - Add/Remove/Modify globals variables. (Not related to the call graph,
>> but I wanted to mention it anyway.)
>>
> Not sure that modifying globals makes sense in a CGSCC pass. Maybe if the
> global were only used by functions in the current SCC. Any specific
> examples?

I mean, if I look at all the uses I might look at other SCCs so I'm
unsure if this is legal for an CGSCC pass at all to look at globals if
we would restrict it to the ones only with uses in the SCC.

>>
>> - Outline parts of a function in the current SCC into a new function
>> that will then either become part of the current SCC (even if that is
>> an overapproximation) or a child SCC.
>>
> I've actually been trying to work on this to make coroutines work under the
> NPM. For an example see https://reviews.llvm.org/D88714. I've been meaning
> to ask for more specifics on exactly what coroutines do to the call graph
> to better understand what a proper solution would look like.
>
>>
>> - Delete calls to any function.
>>
>> There might be more but this is all I remember right now. If it turns
>> out too many things cannot be supported, it's unclear if the module
>> Attributor pass is not going to be preferable, especially since we can
>> parallelize the pass itself more easily (I think) than multiple
>> concurrent CGSCC passes.
>>
> I have a feeling that concurrency at the pass manager level is a long way
> off.

Yeah, and I feel passes move away from CGSCC towards module passes. If
that is the case, I'll concentrate on the Attributor module pass and
look at parallelization of the fixpoint iteration inside the pass
instead.

>>
>> I'd appreciate any thoughts on this :slight_smile:
>>
> At a high level, does the attributor work on SCCs? It seems that it works
> on the call graph, but not in a bottom-up way, making it different from
> typical CGSCC passes. That makes me think it shouldn't be working with the
> CGSCC infra. Did you want to run it alongside the inliner/other CGSCC
> passes?

Running in the inliner loop was the main motivation for the CGSCC
version. Along the ability for other CGSCC passes to use it, e.g., we
run it in the OpenMPOpt pass. We can run it as either CGSCC or module
pass and depending on this thread we might need to add way more
restrictions to the CGSCC pass or completely move to a module version
after all.