[ThinLTO] Reducing imported debug metadata

A couple weeks ago I sat down with David Blaikie to figure out what we could trim when importing debug metadata during function importing. I have a prototype that reduces the resulting .o sizes significantly with ThinLTO -g. However, I ran into some issues when cleaning up part of this in preparation for a patch. Since Mehdi indicated that he and Adrian were going to be looking at this as well shortly, I wanted to send out a summary of what David Blaikie and I discussed, what I have right now, and the issues I am hitting along with possible solutions.

There was a related older patch where I did some of the same things (D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that was made obsolete by a number of changes, such as the reversal of GlobalVariable/DIGlobalVariable edges, and a number of debug metadata changes done by Duncan (and Adrian I think?). So I started from scratch this time.

Here’s what I came up with after discussing what to import with David. Specifically, how to handle fields when importing a DICompileUnit:

a) List of enum types

  • Not needed in the importing module: change to nullptr

b) List of macros

  • Not needed in the importing module: change to nullptr

c) List of global variables

  • Only needed if we have imported the corresponding global variable definition. Since we currently don’t import any global variable definitions (we should assert that there aren’t any in the import list so this doesn’t get stale), we can change this to nullptr.

d) List of imported entities

  • For now need to import those that have a DILocalScope (and drop others from the imported entities list on the imported CU). David had some ideas on restructuring the way these are referenced, but for now we conservatively need to keep those that may be from functions, any that are from functions not actually imported will not be emitted into the object file anyway.

e) List of retained types

  • Leave as-is but import DICompositeTypes as type declarations. This one David thought would need to be under an option because of lldb’s assumption that the DWARF match the Clang AST (I think I am stating that right, correct me if not!).

Implementation:

a)-d) (enums/macros/global variables/imported entities):

For a-d I have a simple solution in the IRLinker. At the very start of IRLinking in a module, if it is being linked in for function importing, I invoke a function that does the handling (i.e. from the IRLinker constructor).

This routine handles a-c by pre-populating the ValueMap’s MDMap entry for those Metadata* (e.g. the RawEnumTypes Metadata*) with nullptr, so metadata mapping automatically makes them nullptr on the imported DICompileUnit.

For d, this routine walks the imported entities on the source CU and builds a SmallVector of those with local scopes. It then invokes replaceImportedEntities on the source CU to replace it with the new list (essentially dropping those with non-local scopes), and again the subsequent metadata mapping just works.

e) (retained types):

For e, changing the imported DICompisiteType to type declarations, I am having some issues. I prototyped this by doing it in the BitcodeReader. I passed in a new flag indicating that we are parsing a module for function importing. Then when parsing a METADATA_COMPOSITE_TYPE, if we are importing a module I change the calls to buildODRType and GET_OR_DISTINCT to instead create a type declaration:

  • pass in flags Flags|DINode::FlagFwdDecl
  • pass in nullptr for BaseType, Elements, VtableHolder, TemplateParams
  • pass in 0 for OffsetInBits

Note that buildODRType will not mutate any existing DICompisiteType for that identifier found in the DITypeMap to a type declaration (FlagFwdDecl) though. This is good since we are sharing the DITypeMap with the original (destination) module, and we don’t want to change any type definitions on the existing destination (importing) module to be type declarations when the same type is also in the source module we are about to import.

When cleaning this up, I initially tried mutating the types on the source module at the start of IRLinking of imported modules. I did this by adding a new DICompositeType function to force convert an existing type definition to a type declaration (since the bitcode reader already would have created a type definition when parsing the imported source module, and as I noted above it wasn’t allowed to be mutated to a declaration after that point). However, this is wrong since it ended up changing type definitions that were also used by the original destination/importing module to type declarations if they were also used in the source module (and therefore shared a DITypeMap entry). To get the forced mutation to a type decl to work here, I would somehow need to detect when a composite type on the source module was not also used by the original destination module. The only way I came up with off the top of my head was to also do a DebugInfoFinder::processModule on the dest module, and subtract the resulting types() from those I found when finding types on the source module I’m about to import, and only force-convert the remaining ones to type decls.

Interested in feedback on any of the above, and in particular on the cleanest way to create type declarations for imported types.

Thanks!
Teresa

A couple weeks ago I sat down with David Blaikie to figure out what we could trim when importing debug metadata during function importing. I have a prototype that reduces the resulting .o sizes significantly with ThinLTO -g. However, I ran into some issues when cleaning up part of this in preparation for a patch. Since Mehdi indicated that he and Adrian were going to be looking at this as well shortly, I wanted to send out a summary of what David Blaikie and I discussed, what I have right now, and the issues I am hitting along with possible solutions.

There was a related older patch where I did some of the same things (D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that was made obsolete by a number of changes, such as the reversal of GlobalVariable/DIGlobalVariable edges, and a number of debug metadata changes done by Duncan (and Adrian I think?). So I started from scratch this time.

Here's what I came up with after discussing what to import with David. Specifically, how to handle fields when importing a DICompileUnit:

Do you have a profile of which (if any) of these is causing the debug info bloat?

I worry/suspect that much of the bloat could be in the DILocation chains (line-table), which are generally non-mergeable.

a) List of enum types
- Not needed in the importing module: change to nullptr

Seems reasonable for -flto=thin (incorrect for -flto=full, though).

b) List of macros
- Not needed in the importing module: change to nullptr

We don't have macros by default, right? You need some -gmacros flag I hope? I remember objecting to the massive debug info bloat when they were introduced, but was convinced by an argument along the lines of "it's-never-going-to-be-on-by-default".

c) List of global variables
- Only needed if we have imported the corresponding global variable definition. Since we currently don't import any global variable definitions (we should assert that there aren't any in the import list so this doesn't get stale), we can change this to nullptr.

Why does this still exist? Can't we delete this list in all modes, since GlobalValues reference their own DIGlobalVariables directly?

d) List of imported entities
- For now need to import those that have a DILocalScope (and drop others from the imported entities list on the imported CU). David had some ideas on restructuring the way these are referenced, but for now we conservatively need to keep those that may be from functions, any that are from functions not actually imported will not be emitted into the object file anyway.

I'd rather just restructure these if we can.

e) List of retained types
- Leave as-is but import DICompositeTypes as type declarations. This one David thought would need to be under an option because of lldb's assumption that the DWARF match the Clang AST (I think I am stating that right, correct me if not!).

How many of these exist? I expect this list to be almost always empty. If it's not, why not? (Did I fail to push the commit to Clang that stopped adding all C++ types here??) Can we somehow prune the list?

I'm concerned that this is not a particularly fast thing to do.

I'm also worried that this would cause major debug info quality degradation on Darwin (defer to Adrian on that...). I'd rather leave it unoptimized unless we're sure we need to do something here.

Also, why do you need this for -flto=thin? Why not just nullptr?

Implementation:

Generally, I'd rather see the direction of explicitly building new DICompileUnits and linking in the correct/necessary arguments, like the IRLinker explicitly builds new llvm::Functions. Ideally, we'd only have to explicitly deal with DICompileUnit (one reason that I'm concerned about (e) above), which we can find through !llvm.dbg.cu.

A couple weeks ago I sat down with David Blaikie to figure out what we could trim when importing debug metadata during function importing. I have a prototype that reduces the resulting .o sizes significantly with ThinLTO -g. However, I ran into some issues when cleaning up part of this in preparation for a patch. Since Mehdi indicated that he and Adrian were going to be looking at this as well shortly, I wanted to send out a summary of what David Blaikie and I discussed, what I have right now, and the issues I am hitting along with possible solutions.

There was a related older patch where I did some of the same things (D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that was made obsolete by a number of changes, such as the reversal of GlobalVariable/DIGlobalVariable edges, and a number of debug metadata changes done by Duncan (and Adrian I think?). So I started from scratch this time.

Here’s what I came up with after discussing what to import with David. Specifically, how to handle fields when importing a DICompileUnit:

Do you have a profile of which (if any) of these is causing the debug info bloat?

I worry/suspect that much of the bloat could be in the DILocation chains (line-table), which are generally non-mergeable.

a) List of enum types

  • Not needed in the importing module: change to nullptr

Seems reasonable for -flto=thin (incorrect for -flto=full, though).

b) List of macros

  • Not needed in the importing module: change to nullptr

We don’t have macros by default, right? You need some -gmacros flag I hope? I remember objecting to the massive debug info bloat when they were introduced, but was convinced by an argument along the lines of “it’s-never-going-to-be-on-by-default”.

Yes, it’s off by default, but still good to think about whether we could do something similar for it than other things here.

c) List of global variables

  • Only needed if we have imported the corresponding global variable definition. Since we currently don’t import any global variable definitions (we should assert that there aren’t any in the import list so this doesn’t get stale), we can change this to nullptr.

Why does this still exist? Can’t we delete this list in all modes, since GlobalValues reference their own DIGlobalVariables directly?

Yeah, I forget why - I think there’s a reason, though.

d) List of imported entities

  • For now need to import those that have a DILocalScope (and drop others from the imported entities list on the imported CU). David had some ideas on restructuring the way these are referenced, but for now we conservatively need to keep those that may be from functions, any that are from functions not actually imported will not be emitted into the object file anyway.

I’d rather just restructure these if we can.

In the simplest form, it probably means adding a new list to DISubprograms (of all the imported entities in that function) - that way at least the function local imported entities would be dropped naturally when the subprogram was dropped (or not imported).

e) List of retained types

  • Leave as-is but import DICompositeTypes as type declarations. This one David thought would need to be under an option because of lldb’s assumption that the DWARF match the Clang AST (I think I am stating that right, correct me if not!).

How many of these exist? I expect this list to be almost always empty. If it’s not, why not? (Did I fail to push the commit to Clang that stopped adding all C++ types here??) Can we somehow prune the list?

Ah, you’re right - I was probably operating under the old model that everything was in retained types. It’s good to know/see that’s not the case. I wonder what, if anything, we put in retained types these days. (there certainly used to be ways - like cases where a type is used without any variable: “((foo*)void_ptr)->x = 3;” - we could probably help in those cases by pushing a retained types list down onto subprograms too, so the uses were attached to subprograms - but that means potentially a lot of duplication (if a type is ‘retained’ in many functions))

I’m concerned that this is not a particularly fast thing to do.

I’m also worried that this would cause major debug info quality degradation on Darwin (defer to Adrian on that…). I’d rather leave it unoptimized unless we’re sure we need to do something here.

Turning definitions into declarations like that is incompatible with lldb - it’d need to be turned off (Teresa mentioned putting it under a flag for this reason) for any lldb platform.

Also, why do you need this for -flto=thin? Why not just nullptr?

Legit point - same as enums, etc, they’ll be emitted somewhere, no need to retain them everywhere, even as declarations.

The proper definition->declaration demotion would have to happen by walking all the debug info looking for types… :confused:

+pcc to answer question about global variable list on CU

Thanks for the comments, responses below.

Teresa

image.png

David and I crossed wires! Some more comments below. Thanks, Teresa

+pcc to answer question about global variable list on CU

Thanks for the comments, responses below.

Teresa

I believe Duncan’s right here, and I was mistaken - we may be able to just null out the retained types list.

I’d be curious what’s in there that your change makes such a big difference - now that Duncan’s corrected my misunderstanding/outdated info about what’s in the retained types list.

image.png

+pcc to answer question about global variable list on CU

Thanks for the comments, responses below.

Teresa

>
> A couple weeks ago I sat down with David Blaikie to figure out what we
could trim when importing debug metadata during function importing. I have
a prototype that reduces the resulting .o sizes significantly with ThinLTO
-g. However, I ran into some issues when cleaning up part of this in
preparation for a patch. Since Mehdi indicated that he and Adrian were
going to be looking at this as well shortly, I wanted to send out a summary
of what David Blaikie and I discussed, what I have right now, and the
issues I am hitting along with possible solutions.
>
> There was a related older patch where I did some of the same things
(D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that
was made obsolete by a number of changes, such as the reversal of
GlobalVariable/DIGlobalVariable edges, and a number of debug metadata
changes done by Duncan (and Adrian I think?). So I started from scratch
this time.
>
> Here's what I came up with after discussing what to import with David.
Specifically, how to handle fields when importing a DICompileUnit:

Do you have a profile of which (if any) of these is causing the debug
info bloat?

I worry/suspect that much of the bloat could be in the DILocation chains
(line-table), which are generally non-mergeable.

Here is a breakdown of the debug section sizes aggregated across all .o
files for a build of Chromium with -g2. These are both with ThinLTO, but
the first column is with all function importing disabled:

  No importing Normal importing (aka "Head" in the below graph)
.debug_abbrev 32013702 60570915
.debug_info 1714741154 8724603593
.debug_line 108033535 440090737
.debug_loc 178426363 264907421
.debug_macinfo 17820 17820
.debug_pubnames 251081673 661630495
.debug_pubtypes 284086653 1867377375
.debug_ranges 65470016 106560864
.debug_str 3304122338 19013084756

I'm not sure if this answers your question above about DILocation chains
though - would that be directly translated into .debug_loc size?

> a) List of enum types
> - Not needed in the importing module: change to nullptr

Seems reasonable for -flto=thin (incorrect for -flto=full, though).

Right, all this is for ThinLTO - I haven't thought about Full LTO as I am
concerned about ThinLTO importing-specific bloat.

> b) List of macros
> - Not needed in the importing module: change to nullptr

We don't have macros by default, right? You need some -gmacros flag I
hope? I remember objecting to the massive debug info bloat when they were
introduced, but was convinced by an argument along the lines of
"it's-never-going-to-be-on-by-default".

No I haven't seen a case where this is non-null in my ThinLTO testing, but
it seemed safe to do for completeness.

> c) List of global variables
> - Only needed if we have imported the corresponding global variable
definition. Since we currently don't import any global variable definitions
(we should assert that there aren't any in the import list so this doesn't
get stale), we can change this to nullptr.

Why does this still exist? Can't we delete this list in all modes, since
GlobalValues reference their own DIGlobalVariables directly?

Peter, can you answer this question? I looked back at the old thread and
ISTR that there were some cases where the reference from the CU was needed
but don't remember the details.

I think this was to handle the case where globaldce drops the
DIGlobalVariable but we still want to keep debug info for it.

What I think we could do instead is to replace the globals list in
DICompileUnit with an "optimized globals" list which would start empty, and
then teach globaldce to move the DIGlobalVariable into the optimized
globals list if possible.

Peter

image.png

David and I crossed wires! Some more comments below. Thanks, Teresa

Ah, OK - yep, that makes more sense. Thanks for clarifying/mentioning!

  • Dave

A couple weeks ago I sat down with David Blaikie to figure out what we could trim when importing debug metadata during function importing. I have a prototype that reduces the resulting .o sizes significantly with ThinLTO -g. However, I ran into some issues when cleaning up part of this in preparation for a patch. Since Mehdi indicated that he and Adrian were going to be looking at this as well shortly, I wanted to send out a summary of what David Blaikie and I discussed, what I have right now, and the issues I am hitting along with possible solutions.

There was a related older patch where I did some of the same things (D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that was made obsolete by a number of changes, such as the reversal of GlobalVariable/DIGlobalVariable edges, and a number of debug metadata changes done by Duncan (and Adrian I think?). So I started from scratch this time.

Here's what I came up with after discussing what to import with David. Specifically, how to handle fields when importing a DICompileUnit:

Do you have a profile of which (if any) of these is causing the debug info bloat?

I worry/suspect that much of the bloat could be in the DILocation chains (line-table), which are generally non-mergeable.

a) List of enum types
- Not needed in the importing module: change to nullptr

Seems reasonable for -flto=thin (incorrect for -flto=full, though).

b) List of macros
- Not needed in the importing module: change to nullptr

We don't have macros by default, right? You need some -gmacros flag I hope? I remember objecting to the massive debug info bloat when they were introduced, but was convinced by an argument along the lines of "it's-never-going-to-be-on-by-default".

c) List of global variables
- Only needed if we have imported the corresponding global variable definition. Since we currently don't import any global variable definitions (we should assert that there aren't any in the import list so this doesn't get stale), we can change this to nullptr.

Why does this still exist? Can't we delete this list in all modes, since GlobalValues reference their own DIGlobalVariables directly?

Global constants may not necessarily have a GlobalValue backing them up (could have been optimized away) and thus are need to be referenced by the CU's list of global variables (or via some other mechanism).

-- adrian

+pcc to answer question about global variable list on CU

Thanks for the comments, responses below.

Teresa

The final binary size is one proxy to identify the “cost of debug info”. DILocation may not be the main contributor in the final binary, I’m not sure. However another proxy is the memory consumption. Back march before we worked on changing the edges in the call graph, my measurements showed that DILocation was by far the largest memory consumer.

My main focus is a bit different from your here: while I care about the size of the debug info emitted, I even care more about the impact on the link-time and the memory consumption. For this reason, any fix in the IRLinker is not totally satisfactory: we already read the bitcode and loaded the IR.
To get to the end of it we need to clearly identify the useless stuff and avoid to load it out of the bitcode in the first place. If it is not loaded from bitcode in the first place, the IR Linker doesn’t even risk to link it!

Now if you have an IRLinker patch that would 1) land before the 4.0 branch, and 2) not be too large / too intrusive, then I’d say let’s do this as a temporary stop gap while we work on a better solution for LLVM 5.0.

+pcc to answer question about global variable list on CU

Thanks for the comments, responses below.

Teresa

>
> A couple weeks ago I sat down with David Blaikie to figure out what we
could trim when importing debug metadata during function importing. I have
a prototype that reduces the resulting .o sizes significantly with ThinLTO
-g. However, I ran into some issues when cleaning up part of this in
preparation for a patch. Since Mehdi indicated that he and Adrian were
going to be looking at this as well shortly, I wanted to send out a summary
of what David Blaikie and I discussed, what I have right now, and the
issues I am hitting along with possible solutions.
>
> There was a related older patch where I did some of the same things
(D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that
was made obsolete by a number of changes, such as the reversal of
GlobalVariable/DIGlobalVariable edges, and a number of debug metadata
changes done by Duncan (and Adrian I think?). So I started from scratch
this time.
>
> Here's what I came up with after discussing what to import with David.
Specifically, how to handle fields when importing a DICompileUnit:

Do you have a profile of which (if any) of these is causing the debug
info bloat?

I worry/suspect that much of the bloat could be in the DILocation chains
(line-table), which are generally non-mergeable.

Here is a breakdown of the debug section sizes aggregated across all .o
files for a build of Chromium with -g2. These are both with ThinLTO, but
the first column is with all function importing disabled:

  No importing Normal importing (aka "Head" in the below graph)
.debug_abbrev 32013702 60570915
.debug_info 1714741154 8724603593 <(872)%20460-3593>
.debug_line 108033535 440090737
.debug_loc 178426363 264907421
.debug_macinfo 17820 17820
.debug_pubnames 251081673 661630495
.debug_pubtypes 284086653 1867377375
.debug_ranges 65470016 106560864
.debug_str 3304122338 <(330)%20412-2338> 19013084756 <(901)%20308-4756>

<image.png>

I'm not sure if this answers your question above about DILocation chains
though - would that be directly translated into .debug_loc size?

The final binary size is one proxy to identify the “cost of debug info”.
DILocation may not be the main contributor in the final binary, I’m not
sure. However another proxy is the memory consumption. Back march before we
worked on changing the edges in the call graph, my measurements showed that
DILocation was by far the largest memory consumer.

I haven't measured the memory reduction in the compiler from my changes,
but the impact on the final gold link is huge.

> a) List of enum types
> - Not needed in the importing module: change to nullptr

Seems reasonable for -flto=thin (incorrect for -flto=full, though).

Right, all this is for ThinLTO - I haven't thought about Full LTO as I am
concerned about ThinLTO importing-specific bloat.

> b) List of macros
> - Not needed in the importing module: change to nullptr

We don't have macros by default, right? You need some -gmacros flag I
hope? I remember objecting to the massive debug info bloat when they were
introduced, but was convinced by an argument along the lines of
"it's-never-going-to-be-on-by-default".

No I haven't seen a case where this is non-null in my ThinLTO testing, but
it seemed safe to do for completeness.

> c) List of global variables
> - Only needed if we have imported the corresponding global variable
definition. Since we currently don't import any global variable definitions
(we should assert that there aren't any in the import list so this doesn't
get stale), we can change this to nullptr.

Why does this still exist? Can't we delete this list in all modes, since
GlobalValues reference their own DIGlobalVariables directly?

Peter, can you answer this question? I looked back at the old thread and
ISTR that there were some cases where the reference from the CU was needed
but don't remember the details.

> d) List of imported entities
> - For now need to import those that have a DILocalScope (and drop
others from the imported entities list on the imported CU). David had some
ideas on restructuring the way these are referenced, but for now we
conservatively need to keep those that may be from functions, any that are
from functions not actually imported will not be emitted into the object
file anyway.

I'd rather just restructure these if we can.

That sounds like a good long-term solution. For for the short term what I
did was incredibly simple. If it has a significant impact (haven't measured
this one independently), I think the short-term fix would be good in the
meantime. David can probably provide more details on what needs to be done
to fix this longer term.

My main focus is a bit different from your here: while I care about the
size of the debug info emitted, I even care more about the impact on the
link-time and the memory consumption. For this reason, any fix in the
IRLinker is not totally satisfactory: we already read the bitcode and
loaded the IR.

This makes a big impact in memory consumption and time for gold, which
reads the debug info. Unlike on OS X where the linker ignores it because
the debug info is read by dsymutil.

To get to the end of it we need to clearly identify the useless stuff and

avoid to load it out of the bitcode in the first place. If it is not loaded
from bitcode in the first place, the IR Linker doesn’t even risk to link it!

Definitely that is even better, but this is a near-term blocker for us on
the native link side.

Now if you have an IRLinker patch that would 1) land before the 4.0
branch, and 2) not be too large / too intrusive, then I’d say let’s do this
as a temporary stop gap while we work on a better solution for LLVM 5.0.

Ok great, I can post the patch that handles most of the changes I mentioned
below today (it's ready and very simple, just need some tests).

The biggest improvement though comes from dropping the imported
DICompositeTypes to decls. For that the two solutions I have come up with
right now are:
1) Detect these in the BitcodeReader and only create a decl instead of a
def - implemented and quite simple
2) Add the ability to force convert to a decl on the source module before
invoking any value mapping and metadata linking (e.g. in the IRLinker
constructor) - implemented but is too aggressive as I need to figure out
which are shared with the dest module
3) Somehow intervene when the metadata is mapped during IR linking to
detect these and create a decl when mapping - this would require adding
some kind of callback to the value mapper code I think.

Approach 1 is definitely the simplest (and has the side benefit of never
creating the def in the parsed source bitcode module in the first place). I
can post a separate patch for that to consider.

Teresa

+pcc to answer question about global variable list on CU

Thanks for the comments, responses below.

Teresa

>
> A couple weeks ago I sat down with David Blaikie to figure out what we
could trim when importing debug metadata during function importing. I have
a prototype that reduces the resulting .o sizes significantly with ThinLTO
-g. However, I ran into some issues when cleaning up part of this in
preparation for a patch. Since Mehdi indicated that he and Adrian were
going to be looking at this as well shortly, I wanted to send out a summary
of what David Blaikie and I discussed, what I have right now, and the
issues I am hitting along with possible solutions.
>
> There was a related older patch where I did some of the same things
(D16440: [ThinLTO] Link in only necessary DICompileUnit operands), but that
was made obsolete by a number of changes, such as the reversal of
GlobalVariable/DIGlobalVariable edges, and a number of debug metadata
changes done by Duncan (and Adrian I think?). So I started from scratch
this time.
>
> Here's what I came up with after discussing what to import with David.
Specifically, how to handle fields when importing a DICompileUnit:

Do you have a profile of which (if any) of these is causing the debug
info bloat?

I worry/suspect that much of the bloat could be in the DILocation chains
(line-table), which are generally non-mergeable.

Here is a breakdown of the debug section sizes aggregated across all .o
files for a build of Chromium with -g2. These are both with ThinLTO, but
the first column is with all function importing disabled:

  No importing Normal importing (aka "Head" in the below graph)
.debug_abbrev 32013702 60570915
.debug_info 1714741154 8724603593 <(872)%20460-3593>
.debug_line 108033535 440090737
.debug_loc 178426363 264907421
.debug_macinfo 17820 17820
.debug_pubnames 251081673 661630495
.debug_pubtypes 284086653 1867377375
.debug_ranges 65470016 106560864
.debug_str 3304122338 <(330)%20412-2338> 19013084756 <(901)%20308-4756>

<image.png>

I'm not sure if this answers your question above about DILocation chains
though - would that be directly translated into .debug_loc size?

The final binary size is one proxy to identify the “cost of debug info”.
DILocation may not be the main contributor in the final binary, I’m not
sure. However another proxy is the memory consumption. Back march before we
worked on changing the edges in the call graph, my measurements showed that
DILocation was by far the largest memory consumer.

I haven't measured the memory reduction in the compiler from my changes,
but the impact on the final gold link is huge.

> a) List of enum types
> - Not needed in the importing module: change to nullptr

Seems reasonable for -flto=thin (incorrect for -flto=full, though).

Right, all this is for ThinLTO - I haven't thought about Full LTO as I am
concerned about ThinLTO importing-specific bloat.

> b) List of macros
> - Not needed in the importing module: change to nullptr

We don't have macros by default, right? You need some -gmacros flag I
hope? I remember objecting to the massive debug info bloat when they were
introduced, but was convinced by an argument along the lines of
"it's-never-going-to-be-on-by-default".

No I haven't seen a case where this is non-null in my ThinLTO testing,
but it seemed safe to do for completeness.

> c) List of global variables
> - Only needed if we have imported the corresponding global variable
definition. Since we currently don't import any global variable definitions
(we should assert that there aren't any in the import list so this doesn't
get stale), we can change this to nullptr.

Why does this still exist? Can't we delete this list in all modes,
since GlobalValues reference their own DIGlobalVariables directly?

Peter, can you answer this question? I looked back at the old thread and
ISTR that there were some cases where the reference from the CU was needed
but don't remember the details.

> d) List of imported entities
> - For now need to import those that have a DILocalScope (and drop
others from the imported entities list on the imported CU). David had some
ideas on restructuring the way these are referenced, but for now we
conservatively need to keep those that may be from functions, any that are
from functions not actually imported will not be emitted into the object
file anyway.

I'd rather just restructure these if we can.

That sounds like a good long-term solution. For for the short term what I
did was incredibly simple. If it has a significant impact (haven't measured
this one independently), I think the short-term fix would be good in the
meantime. David can probably provide more details on what needs to be done
to fix this longer term.

My main focus is a bit different from your here: while I care about the
size of the debug info emitted, I even care more about the impact on the
link-time and the memory consumption. For this reason, any fix in the
IRLinker is not totally satisfactory: we already read the bitcode and
loaded the IR.

This makes a big impact in memory consumption and time for gold, which
reads the debug info. Unlike on OS X where the linker ignores it because
the debug info is read by dsymutil.

To get to the end of it we need to clearly identify the useless stuff and

avoid to load it out of the bitcode in the first place. If it is not loaded
from bitcode in the first place, the IR Linker doesn’t even risk to link it!

Definitely that is even better, but this is a near-term blocker for us on
the native link side.

Now if you have an IRLinker patch that would 1) land before the 4.0
branch, and 2) not be too large / too intrusive, then I’d say let’s do this
as a temporary stop gap while we work on a better solution for LLVM 5.0.

Ok great, I can post the patch that handles most of the changes I
mentioned below today (it's ready and very simple, just need some tests).

D27635
Thanks,
Teresa