Deleting function IR after codegen

Hi all

After codegen for a given function, the IR should no longer be needed. In the AsmPrinter we convert from MI->MCInstr, and then we never go back and look at the IR during the MC layer.

I’ve prototyped a simple pass which can be (optionally) scheduled to do just this. It is added at the end of addPassesToEmitFile. It is optional so that clang can continue to leak the IR with --disable-free, but i would propose that all other tools, and especially LTO, would enable it. The savings are 20% of peak memory in LTO of clang itself.

I could attach a patch, but first i’d really like to know if anyone is fundamentally opposed to this.

I should note, a couple of issues have come up in the prototype.
- llvm::getDISubprogram was walking the function body to find the subprogram. This is trivial to fix as functions now have !dbg on them.
- The AsmPrinter is calling canBeOmittedFromSymbolTable on GlobalValue’s which then walks all their uses. I think this should be done earlier in codegen as an analysis whose results are available to the AsmPrinter.
- BB’s whose addresses are taken, i.e. jump tables, can’t be deleted. Those functions will just keep their IR around so no changes there.

With the above issues fixed, I can run make check and pass all the tests.

Comments very welcome.

Cheers,
Pete

From: "Pete Cooper" <peter_cooper@apple.com>
To: "Chandler Carruth" <chandlerc@gmail.com>, "Duncan P. N. Exon Smith" <dexonsmith@apple.com>, "Hal Finkel"
<hfinkel@anl.gov>, "Philip Reames" <listmail@philipreames.com>, "Mehdi Amini" <mehdi.amini@apple.com>, "Rafael
Espíndola" <rafael.espindola@gmail.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Monday, March 7, 2016 6:44:06 PM
Subject: Deleting function IR after codegen

Hi all

After codegen for a given function, the IR should no longer be
needed. In the AsmPrinter we convert from MI->MCInstr, and then we
never go back and look at the IR during the MC layer.

I’ve prototyped a simple pass which can be (optionally) scheduled to
do just this. It is added at the end of addPassesToEmitFile. It is
optional so that clang can continue to leak the IR with
--disable-free, but i would propose that all other tools, and
especially LTO, would enable it. The savings are 20% of peak memory
in LTO of clang itself.

I could attach a patch, but first i’d really like to know if anyone
is fundamentally opposed to this.

I should note, a couple of issues have come up in the prototype.
- llvm::getDISubprogram was walking the function body to find the
subprogram. This is trivial to fix as functions now have !dbg on
them.
- The AsmPrinter is calling canBeOmittedFromSymbolTable on
GlobalValue’s which then walks all their uses. I think this should
be done earlier in codegen as an analysis whose results are
available to the AsmPrinter.

I imagine that these are good cleanups regardless, it sounds like the fixes are also likely compile-time improvements.

- BB’s whose addresses are taken, i.e. jump tables, can’t be deleted.
Those functions will just keep their IR around so no changes there.

With the above issues fixed, I can run make check and pass all the
tests.

Comments very welcome.

As an optional feature, it certainly makes sense to me.

-Hal

I think such a pass is worth it.
I see a conceptual issue with it though, it would have to be FunctionPass, and such pass are not allowed to delete globals and Functions (see http://llvm.org/docs/WritingAnLLVMPass.html#the-functionpass-class ).
That said the CodeGen could be made a CallGraphSCCPass (which is something we thought about investigating to help register allocation at call site when you have already CodeGen the callee and can infer what is preserved). I'm not sure your feature alone justify such a change though.

Out of curiosity, how do you expose it to the client by the way:

Target->addPassesToEmitFile(PM, ....)
if (deleteFunctionAfterCodeGen)
  PM.addPass(createFunctionDeletionPass())
or

Target->addPassesToEmitFile(PM, ...., deleteFunctionAfterCodeGen);

Pete Cooper via llvm-dev <llvm-dev@lists.llvm.org> writes:

Hi all

After codegen for a given function, the IR should no longer be needed.
In the AsmPrinter we convert from MI->MCInstr, and then we never go
back and look at the IR during the MC layer.

I’ve prototyped a simple pass which can be (optionally) scheduled to
do just this. It is added at the end of addPassesToEmitFile. It is
optional so that clang can continue to leak the IR with
--disable-free, but i would propose that all other tools, and
especially LTO, would enable it. The savings are 20% of peak memory
in LTO of clang itself.

Did you happen to measure whether it's even worth keeping for
--disable-free? I suppose the cost of deleting it might be enough to
worry about for very small compilations, in which case we'd probably
want to keep the current behaviour.

I could attach a patch, but first i’d really like to know if anyone is
fundamentally opposed to this.

I'm definitely interested in seeing the patch. ISTM that anywhere the MC
layer needs to access IR would be a bit of a layering violation already,
so we'd want to clean that up anyway.

From: “Pete Cooper” <peter_cooper@apple.com>
To: “Chandler Carruth” <chandlerc@gmail.com>, “Duncan P. N. Exon Smith” <dexonsmith@apple.com>, “Hal Finkel”
<hfinkel@anl.gov>, “Philip Reames” <listmail@philipreames.com>, “Mehdi Amini” <mehdi.amini@apple.com>, “Rafael
Espíndola” <rafael.espindola@gmail.com>
Cc: “llvm-dev” <llvm-dev@lists.llvm.org>
Sent: Monday, March 7, 2016 6:44:06 PM
Subject: Deleting function IR after codegen

Hi all

After codegen for a given function, the IR should no longer be
needed. In the AsmPrinter we convert from MI->MCInstr, and then we
never go back and look at the IR during the MC layer.

I’ve prototyped a simple pass which can be (optionally) scheduled to
do just this. It is added at the end of addPassesToEmitFile. It is
optional so that clang can continue to leak the IR with
–disable-free, but i would propose that all other tools, and
especially LTO, would enable it. The savings are 20% of peak memory
in LTO of clang itself.

I could attach a patch, but first i’d really like to know if anyone
is fundamentally opposed to this.

I should note, a couple of issues have come up in the prototype.

  • llvm::getDISubprogram was walking the function body to find the
    subprogram. This is trivial to fix as functions now have !dbg on
    them.
  • The AsmPrinter is calling canBeOmittedFromSymbolTable on
    GlobalValue’s which then walks all their uses. I think this should
    be done earlier in codegen as an analysis whose results are
    available to the AsmPrinter.

I imagine that these are good cleanups regardless, it sounds like the fixes are also likely compile-time improvements.

Cool. I’ll get some patches together for these then. The first should be out soon.

And yes, the !dbg one in particular should be a small compile time improvement.

  • BB’s whose addresses are taken, i.e. jump tables, can’t be deleted.
    Those functions will just keep their IR around so no changes there.

With the above issues fixed, I can run make check and pass all the
tests.

Comments very welcome.

As an optional feature, it certainly makes sense to me.

Thanks! :slight_smile:

Pete

Hi all

After codegen for a given function, the IR should no longer be needed. In the AsmPrinter we convert from MI->MCInstr, and then we never go back and look at the IR during the MC layer.

I’ve prototyped a simple pass which can be (optionally) scheduled to do just this. It is added at the end of addPassesToEmitFile. It is optional so that clang can continue to leak the IR with --disable-free, but i would propose that all other tools, and especially LTO, would enable it. The savings are 20% of peak memory in LTO of clang itself.

I think such a pass is worth it.
I see a conceptual issue with it though, it would have to be FunctionPass, and such pass are not allowed to delete globals and Functions (see http://llvm.org/docs/WritingAnLLVMPass.html#the-functionpass-class ).

Ah, yeah this is tricky. So my pass isn’t deleting the function value, but it is deleting all the BBs/Instrs inside. When you read the IR it actually prints it as a declaration because the AsmWriter thinks that no body implies a declaration.

One issue this this is that FPPassManager::runOnFunction starts by checking for declarations and skipping them. So that means that if a function pass was to run after my pass, then it would get a function with no body which is not what it expects.

One option I considered was to replace the function body with a single BB and an unreachable/return, but I didn’t want to create useless IR.

That said the CodeGen could be made a CallGraphSCCPass (which is something we thought about investigating to help register allocation at call site when you have already CodeGen the callee and can infer what is preserved). I’m not sure your feature alone justify such a change though.

I would certainly like to see CodeGen work this way. If my work requires it then I have no problem with it, but i’d love to land my changes without this if its possible, just in case its troublesome to make this change.

Out of curiosity, how do you expose it to the client by the way:

Target->addPassesToEmitFile(PM, …)
if (deleteFunctionAfterCodeGen)
PM.addPass(createFunctionDeletionPass())
or

Target->addPassesToEmitFile(PM, …, deleteFunctionAfterCodeGen);

The above. The specific lines I changed on the function prototype are this, which means that I’m opting everyone in to my change, but then as I said we’d probably change clang to propagate its DisableFree flag in here:

virtual bool addPassesToEmitFile(
PassManagerBase &, raw_pwrite_stream &, CodeGenFileType,

  • bool /DisableVerify/ = true, AnalysisID /StartBefore/ = nullptr,
  • bool /DisableVerify/ = true, bool /FreeFunctionIR/ = true,
  • AnalysisID /StartBefore/ = nullptr,
    AnalysisID /StartAfter/ = nullptr, AnalysisID /StopAfter/ = nullptr,
    MachineFunctionInitializer * /MFInitializer/ = nullptr)

Cheers,
Pete

Pete Cooper via llvm-dev <llvm-dev@lists.llvm.org> writes:

Hi all

After codegen for a given function, the IR should no longer be needed.
In the AsmPrinter we convert from MI->MCInstr, and then we never go
back and look at the IR during the MC layer.

I’ve prototyped a simple pass which can be (optionally) scheduled to
do just this. It is added at the end of addPassesToEmitFile. It is
optional so that clang can continue to leak the IR with
–disable-free, but i would propose that all other tools, and
especially LTO, would enable it. The savings are 20% of peak memory
in LTO of clang itself.

Did you happen to measure whether it’s even worth keeping for
–disable-free? I suppose the cost of deleting it might be enough to
worry about for very small compilations, in which case we’d probably
want to keep the current behavior.

I didn’t, no. We could though.

I’ve CCed Vedant as I know he looked at the cost of freeing the IR fairly recently. I think he said it was about 3-4% compile time, but he can correct me if i have that wrong.

I could attach a patch, but first i’d really like to know if anyone is
fundamentally opposed to this.

I’m definitely interested in seeing the patch. ISTM that anywhere the MC
layer needs to access IR would be a bit of a layering violation already,
so we’d want to clean that up anyway.

Personally I think its a layering violation. At least I think it is one for the AsmPrinter FunctionPass to be walking the use lists of Globals. I don’t think FunctionPasses at any level should be able to do that.

Cheers,
Pete

Hi all,

Did you happen to measure whether it's even worth keeping for
--disable-free? I suppose the cost of deleting it might be enough to
worry about for very small compilations, in which case we'd probably
want to keep the current behavior.

I didn’t, no. We could though.

I’ve CCed Vedant as I know he looked at the cost of freeing the IR fairly recently. I think he said it was about 3-4% compile time, but he can correct me if i have that wrong.

I measured how much time -disable-free saves for clang. It's 3% of the compile time of clang, opt, and a host of other llvm tools when built cleanly in release mode.

My thoughts on Pete's pass are the same as Hal's, + 1.

vedant

I like the idea of deleting IR once we are at MC. Hopefully one day
even just after codegen.

- The AsmPrinter is calling canBeOmittedFromSymbolTable on GlobalValue’s which then walks all their uses. I think this should be done earlier in codegen as an analysis whose results are available to the AsmPrinter.

Starting with an analysis is probably the way to go. At some point it
might make sense to cache that information in the IR somehow for use
in LTO, but that will require some benchmarking to see if it is really
worth it.

Cheers,
Rafael

I like the idea of deleting IR once we are at MC. Hopefully one day
even just after codegen.

Yeah, that would be ideal. Better is some time inside codegen, but the issue inside codegen is AA, but thats not solvable any time soon I think.

- The AsmPrinter is calling canBeOmittedFromSymbolTable on GlobalValue’s which then walks all their uses. I think this should be done earlier in codegen as an analysis whose results are available to the AsmPrinter.

Starting with an analysis is probably the way to go. At some point it
might make sense to cache that information in the IR somehow for use
in LTO, but that will require some benchmarking to see if it is really
worth it.

I agree. I think it was escape analysis, and I can similarly imagine ‘escapes’ being a useful function argument attribute, so we could add it to both globals and arguments.

Cheers,
Pete

I could attach a patch, but first i’d really like to know if anyone is fundamentally opposed to this.

Not necessarily. I think that any information that isn’t being serialized in MI right now for a function could be as well. Definitely something for GlobalISel to keep in mind.

I should note, a couple of issues have come up in the prototype.

  • llvm::getDISubprogram was walking the function body to find the subprogram. This is trivial to fix as functions now have !dbg on them.

This is definitely worth it, please go ahead and do this.

  • The AsmPrinter is calling canBeOmittedFromSymbolTable on GlobalValue’s which then walks all their uses. I think this should be done earlier in codegen as an analysis whose results are available to the AsmPrinter.

I think this makes sense, but I worry about late added GlobalValues during code gen? How would we cope with that? Example: Let’s say we start lowering a target specific construct as an MI pass etc and it constructs a global value, when do we run the analysis to make sure that all such things happen? Late as possible I’d assume. Maybe this isn’t an issue, but thought I’d bring it up. At any rate, could you provide a bit more detail here?

  • BB’s whose addresses are taken, i.e. jump tables, can’t be deleted. Those functions will just keep their IR around so no changes there.

Oh well. Conveniently there aren’t a lot of these.

-eric

I could attach a patch, but first i’d really like to know if anyone is fundamentally opposed to this.

Not necessarily. I think that any information that isn’t being serialized in MI right now for a function could be as well. Definitely something for GlobalISel to keep in mind.

+1.
That’s basically where I would like to go with MachineModule/MachineModulePass.
http://lists.llvm.org/pipermail/llvm-dev/2016-January/094426.html

Cheers,
-Quentin

I could attach a patch, but first i’d really like to know if anyone is fundamentally opposed to this.

Not necessarily. I think that any information that isn’t being serialized in MI right now for a function could be as well. Definitely something for GlobalISel to keep in mind.

Probably even depends on the kind of AA. TBAA is likely mostly useable without instructions around (you could reference the TBAA from the MIs themselves), but I guess BasicAA is still introspecting the instructions enough to make it harder to migrate.

I should note, a couple of issues have come up in the prototype.

  • llvm::getDISubprogram was walking the function body to find the subprogram. This is trivial to fix as functions now have !dbg on them.

This is definitely worth it, please go ahead and do this.

Will do. Thanks.

  • The AsmPrinter is calling canBeOmittedFromSymbolTable on GlobalValue’s which then walks all their uses. I think this should be done earlier in codegen as an analysis whose results are available to the AsmPrinter.

I think this makes sense, but I worry about late added GlobalValues during code gen?

The code which walks them is looking for an ICmpInst with the global on one side. Basically wants to know if the address is interesting. Globals added from selection DAG onwards likely won’t have an IR instruction added to compare them, although if for some reason someone added a global and materialized the compare directly in MIs then I guess the current code would be incorrect.

How would we cope with that? Example: Let’s say we start lowering a target specific construct as an MI pass etc and it constructs a global value, when do we run the analysis to make sure that all such things happen? Late as possible I’d assume. Maybe this isn’t an issue, but thought I’d bring it up. At any rate, could you provide a bit more detail here?

It does sound like this is going to be tricky. I was hoping to run this as a Module pass prior to starting the codegen function passes, but there’s nothing to stop CodeGenPrepare from adding a compare to a function, after when I was hoping to run the analysis. I’ll need to mull this over.

  • BB’s whose addresses are taken, i.e. jump tables, can’t be deleted. Those functions will just keep their IR around so no changes there.

Oh well. Conveniently there aren’t a lot of these.

Yeah, not ideal, but hopefully not too bad either.

Pete

FWIW, +1 from me as well.

But I don’t think you need to make this a module pass or anything else. I think you should leave the husks of the functions around and just nuke the IR out from under them. That way the module surface remains essentially identical. You can also probably nuke all the instructions from BBs with their addresses taken for jump tables, etc.

-Chandler

From: "Chandler Carruth" <chandlerc@gmail.com>
To: "Quentin Colombet" <qcolombet@apple.com>, "Eric Christopher"
<echristo@gmail.com>
Cc: "Pete Cooper" <peter_cooper@apple.com>, "Duncan P. N. Exon Smith"
<dexonsmith@apple.com>, "Hal Finkel" <hfinkel@anl.gov>, "Philip
Reames" <listmail@philipreames.com>, "Mehdi Amini"
<mehdi.amini@apple.com>, "Rafael Espíndola"
<rafael.espindola@gmail.com>, "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Thursday, May 12, 2016 6:11:34 PM
Subject: Re: [llvm-dev] Deleting function IR after codegen

FWIW, +1 from me as well.

But I don't think you need to make this a module pass or anything
else. I think you should leave the husks of the functions around and
just nuke the IR out from under them. That way the module surface
remains essentially identical. You can also probably nuke all the
instructions from BBs with their addresses taken for jump tables,
etc.

I agree. We need to be careful about invalidating inter-procedural IR-level analysis results that we make use of using CodeGen (like AA). Consider, for example, the following situation:

1. We CodeGen a function foo(), and then remove its IR. During this process, we never used an AA query that reached CFL-AA
2. We CodeGen a function bar(), and bar() calls foo().
3. During (2), we make an AA query on instructions in bar(), which have MMOs with IR Values in bar()
4. CFL-AA is reached and does not have a cached graph for bar(), so it builds one
5. While building the graph for bar(), it reaches the call to foo(), and calls tryInterproceduralAnalysis
6. CFL-AA does not have a cached graph for foo(), so tryInterproceduralAnalysis triggers one to be constructed
7. But foo() is now empty, and so has trivial aliasing properties
8. We return an incorrect AA result when compiling bar()

-Hal

FWIW, +1 from me as well.

Cool. Thanks Chandler.

But I don’t think you need to make this a module pass or anything else. I think you should leave the husks of the functions around and just nuke the IR out from under them.

Yeah, no need for a pass unless we really want it. I could, for example, just delete the function bodies in the AsmPrinter once we’re done with them. Will see how that looks and get started on a patch and actual review thread.

That way the module surface remains essentially identical. You can also probably nuke all the instructions from BBs with their addresses taken for jump tables, etc.

Sounds good. I never considered removing the contents of the BBs too, but thats a good idea.

Cheers,
Pete


From: “Chandler Carruth” <chandlerc@gmail.com>
To: “Quentin Colombet” <qcolombet@apple.com>, “Eric Christopher” <echristo@gmail.com>
Cc: “Pete Cooper” <peter_cooper@apple.com>, “Duncan P. N. Exon Smith” <dexonsmith@apple.com>, “Hal Finkel” <hfinkel@anl.gov>, “Philip Reames” <listmail@philipreames.com>, “Mehdi Amini” <mehdi.amini@apple.com>, “Rafael Espíndola” <rafael.espindola@gmail.com>, “llvm-dev” <llvm-dev@lists.llvm.org>
Sent: Thursday, May 12, 2016 6:11:34 PM
Subject: Re: [llvm-dev] Deleting function IR after codegen

FWIW, +1 from me as well.

But I don’t think you need to make this a module pass or anything else. I think you should leave the husks of the functions around and just nuke the IR out from under them. That way the module surface remains essentially identical. You can also probably nuke all the instructions from BBs with their addresses taken for jump tables, etc.

I agree. We need to be careful about invalidating inter-procedural IR-level analysis results that we make use of using CodeGen (like AA). Consider, for example, the following situation:

  1. We CodeGen a function foo(), and then remove its IR. During this process, we never used an AA query that reached CFL-AA
  2. We CodeGen a function bar(), and bar() calls foo().
  3. During (2), we make an AA query on instructions in bar(), which have MMOs with IR Values in bar()
  4. CFL-AA is reached and does not have a cached graph for bar(), so it builds one
  5. While building the graph for bar(), it reaches the call to foo(), and calls tryInterproceduralAnalysis
  6. CFL-AA does not have a cached graph for foo(), so tryInterproceduralAnalysis triggers one to be constructed
  7. But foo() is now empty, and so has trivial aliasing properties
  8. We return an incorrect AA result when compiling bar()

Hmm. This is an interesting use case. Is this even legal in the current pass manager? Codegen is a FunctionPass which I thought could only look at global variables and its own function.

Or is the rule that a FunctionPass has read/write access to its own function, but read-only to other functions which is effectively what you’d need for the CFL-AA case above?

Cheers,
Pete

From: "Pete Cooper" <peter_cooper@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chandler Carruth" <chandlerc@gmail.com>, "Duncan P. N. Exon
Smith" <dexonsmith@apple.com>, "Philip Reames"
<listmail@philipreames.com>, "Mehdi Amini" <mehdi.amini@apple.com>,
"Rafael Espíndola" <rafael.espindola@gmail.com>, "llvm-dev"
<llvm-dev@lists.llvm.org>, "Quentin Colombet" <qcolombet@apple.com>,
"Eric Christopher" <echristo@gmail.com>
Sent: Thursday, May 12, 2016 6:39:25 PM
Subject: Re: [llvm-dev] Deleting function IR after codegen

> > From: "Chandler Carruth" < chandlerc@gmail.com >
>

> > To: "Quentin Colombet" < qcolombet@apple.com >, "Eric
> > Christopher"
> > <
> > echristo@gmail.com >
>

> > Cc: "Pete Cooper" < peter_cooper@apple.com >, "Duncan P. N. Exon
> > Smith" < dexonsmith@apple.com >, "Hal Finkel" < hfinkel@anl.gov
> > >,
> > "Philip Reames" < listmail@philipreames.com >, "Mehdi Amini" <
> > mehdi.amini@apple.com >, "Rafael Espíndola" <
> > rafael.espindola@gmail.com >, "llvm-dev" <
> > llvm-dev@lists.llvm.org
> > >
>

> > Sent: Thursday, May 12, 2016 6:11:34 PM
>

> > Subject: Re: [llvm-dev] Deleting function IR after codegen
>

> > FWIW, +1 from me as well.
>

> > But I don't think you need to make this a module pass or anything
> > else. I think you should leave the husks of the functions around
> > and
> > just nuke the IR out from under them. That way the module surface
> > remains essentially identical. You can also probably nuke all the
> > instructions from BBs with their addresses taken for jump tables,
> > etc.
>

> I agree. We need to be careful about invalidating inter-procedural
> IR-level analysis results that we make use of using CodeGen (like
> AA). Consider, for example, the following situation:

> 1. We CodeGen a function foo(), and then remove its IR. During this
> process, we never used an AA query that reached CFL-AA

> 2. We CodeGen a function bar(), and bar() calls foo().

> 3. During (2), we make an AA query on instructions in bar(), which
> have MMOs with IR Values in bar()

> 4. CFL-AA is reached and does not have a cached graph for bar(), so
> it builds one

> 5. While building the graph for bar(), it reaches the call to
> foo(),
> and calls tryInterproceduralAnalysis

> 6. CFL-AA does not have a cached graph for foo(), so
> tryInterproceduralAnalysis triggers one to be constructed

> 7. But foo() is now empty, and so has trivial aliasing properties

> 8. We return an incorrect AA result when compiling bar()

Hmm. This is an interesting use case. Is this even legal in the
current pass manager? Codegen is a FunctionPass which I thought
could only look at global variables and its own function.

Or is the rule that a FunctionPass has read/write access to its own
function, but read-only to other functions which is effectively what
you’d need for the CFL-AA case above?

I believe that, technically speaking, this is not legal. CFL-AA should really be a module pass if it wants to do IPA. I think that, at the time it was originally developed, it was made a function pass so that it could work without modifying every other function pass in the pipeline to mark it as preserved. Of course, we later did this anyway for GlobalsAA (a module-level analysis), so we should probably go back and do the same for CFL-AA.

I'm not sure that's the important distinction, however. GlobalsAA pre-computes all necessary data for the module up front (when runOnModule is called), so the scenario I described has no analogy there. If GlobalsAA computed information on Globals lazily (like CFL-AA builds function graphs lazily), it would have the same problem.

I'm not claiming that this is a good thing, I just wanted to point out that it can be a problem given code we have in-tree now.

-Hal