distinct DISubprograms hindering sharing inlined subprogram descriptions

Branching off from a discussion of improvements to DIGlobalVariable representations that Adrian’s working on - got me thinking about related changes that have already been made to DISubprogram.

To reduce duplicate debug info when things like linkonce_odr functions were deduplicated in LTO linking, the relationship between a CU and DISubprogram was inverted (instead of a CU maintaining a list of subprograms, subprograms specify which CU they come from - and the llvm::Function references the DISubprogram, so if the llvm::Function goes away, so does the associated DISubprogram)

I’m not sure if this caused a regression, but at least seems to miss a possible improvement:

During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram definitions (& their CU link, even if they weren’t marked ‘distinct’, the CU link would cause them to effectively be so) remain separate - this means that inlined versions in one CU don’t refer to an existing subprogram definition in another CU.

To demonstrate:
inl.h:

void f1();
inline attribute((always_inline)) void f2() {
f1();
}
inl1.cpp:

#include “inl.h”
void c1() {
f2();
}
inl2.cpp:

#include “inl.h”
void c2() {
f2();
}

Compile to IR, llvm-link the result. The DWARF you get is basically the same as the DWARF you’d get without linking:

DW_TAG_compile_unit
DW_AT_name “inl1.cpp”
DW_TAG_subprogram #0
DW_AT_name “f2”

DW_TAG_subprogram
DW_AT_name “c1”
DW_TAG_inlined_subroutine
DW_TAG_abstract_origin #0 “f2”

DW_TAG_compile_unit
DW_AT_name “inl2.cpp”
DW_TAG_subprogram #1
DW_AT_name “f2”

DW_TAG_subprogram
DW_AT_name “c2”
DW_TAG_inlined_subroutine
DW_TAG_abstract_origin #1 “f2”

Instead of something more like this:

DW_TAG_compile_unit
DW_AT_name “inl1.cpp”
DW_TAG_subprogram #0
DW_AT_name “f2”

DW_TAG_subprogram
DW_AT_name “c1”
DW_TAG_inlined_subroutine
DW_TAG_abstract_origin #0 “f2”

DW_TAG_compile_unit
DW_AT_name “inl2.cpp”

DW_TAG_subprogram
DW_AT_name “c2”
DW_TAG_inlined_subroutine
DW_TAG_abstract_origin #0 “f2”

(note that only one abstract definition of f2 is produced here)

Any thoughts? I imagine this is probably worth a reasonable amount of savings in an optimized build. Not huge, but not nothing. (probably not the top of anyone’s list though, I realize)

Should we remove the CU link from a non-internal linkage subprogram (& this may have an effect on the GV representation issue originally being discussed) and just emit it into whichever CU happens to need it first?

This might be slightly sub-optimal, due to, say, the namespace being foreign to that CU. But it’s how we do types currently, I think? So at least it’d be consistent and probably cheap enough/better.

Trying to wrap my brain around this, so a few questions below. =)

Branching off from a discussion of improvements to DIGlobalVariable representations that Adrian's working on - got me thinking about related changes that have already been made to DISubprogram.

To reduce duplicate debug info when things like linkonce_odr functions were deduplicated in LTO linking, the relationship between a CU and DISubprogram was inverted (instead of a CU maintaining a list of subprograms, subprograms specify which CU they come from - and the llvm::Function references the DISubprogram, so if the llvm::Function goes away, so does the associated DISubprogram)

I'm not sure if this caused a regression, but at least seems to miss a possible improvement:

During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram definitions (& their CU link, even if they weren't marked 'distinct', the CU link would cause them to effectively be so) remain separate - this means that inlined versions in one CU don't refer to an existing subprogram definition in another CU.

To demonstrate:
inl.h:
void f1();
inline __attribute__((always_inline)) void f2() {
  f1();
}
inl1.cpp:
#include "inl.h"
void c1() {
  f2();
}
inl2.cpp:
#include "inl.h"
void c2() {
  f2();
}

Compile to IR, llvm-link the result. The DWARF you get is basically the same as the DWARF you'd get without linking:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram #1
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #1 "f2"

Instead of something more like this:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"

(note that only one abstract definition of f2 is produced here)

Any thoughts? I imagine this is probably worth a reasonable amount of savings in an optimized build. Not huge, but not nothing. (probably not the top of anyone's list though, I realize)

I should see the IR metadata in both cases to really know how it worked before, but it seems that to be able to merge the two subprogram definitions when linking, we’d need to be able to have a list of CU per subprogram instead of a single one, right?

Should we remove the CU link from a non-internal linkage subprogram (& this may have an effect on the GV representation issue originally being discussed) and just emit it into whichever CU happens to need it first?

Looks like this should work as well, but I don’t enough about the way we emit Dwarf...

Trying to wrap my brain around this, so a few questions below. =)

Sure thing - sorry, did assume a bit too much arcane context here.

Not quite - having the SP->CU link (well,h onestly, marking the SP as “distinct” does this, but even if we didn’t do that, the SP->CU link would still do it) causes SPs /not/ to be deduplicated on IR linking.

Each SP is distinct/not considered duplicate with any other. (if we didn’t mark it ‘distinct’, the fact that each SP refers to its corresponding CU would produce the same effect - they wouldn’t be deduplicated because they aren’t identical - they refer to different CUs)

For non-inlined cases, this is fine.

Before we inverted the SP<>CU link, what would happen is that all copies of the llvm::Function would be dropped, but their SPs would be left around. So two CUs that both used the same linkonce_odr function (let’s say no inlining actually occurred though) would both have a SP description in the DWARF - but one would actual have a proper definition (with a high/low PC, etc) the other would be missing those features, as though the function had been optimized away (which it sort of had)

So by reversing the link, we got rid of those extra SP descriptions in the DWARF (and the extra SP descriptions in the metadata - I think they were duplicate back then because they still had a scope chain leading back to their CU (maybe we had gotten rid of that chain - if we had, then adding it back in may’ve actually caused more metadata, but less DWARF))

Complete visibility isn’t required to benefit here - and unfortunately there’s nothing fancier (that I know of) that we can do to avoid emitting one definition of each used inline function in each thinlto object file we produce (we can’t say “oh, the name of the function, its mangled name, the names and types of its parameters are over in that other object file/somewhere else” - but we can avoid emitting those descriptions in each /CU/ that uses the inlined function within a single ThinLTO object)

I can provide some more thorough examples if that’d be helpful :slight_smile:

Branching off from a discussion of improvements to DIGlobalVariable representations that Adrian’s working on - got me thinking about related changes that have already been made to DISubprogram.

To reduce duplicate debug info when things like linkonce_odr functions were deduplicated in LTO linking, the relationship between a CU and DISubprogram was inverted (instead of a CU maintaining a list of subprograms, subprograms specify which CU they come from - and the llvm::Function references the DISubprogram, so if the llvm::Function goes away, so does the associated DISubprogram)

I’m not sure if this caused a regression, but at least seems to miss a possible improvement:

During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram definitions (& their CU link, even if they weren’t marked ‘distinct’, the CU link would cause them to effectively be so) remain separate - this means that inlined versions in one CU don’t refer to an existing subprogram definition in another CU.

To demonstrate:
inl.h:
void f1();
inline attribute((always_inline)) void f2() {
f1();
}
inl1.cpp:
#include “inl.h”
void c1() {
f2();
}
inl2.cpp:
#include “inl.h”
void c2() {
f2();
}

Compile to IR, llvm-link the result. The DWARF you get is basically the same as the DWARF you’d get without linking:

DW_TAG_compile_unit
DW_AT_name “inl1.cpp”
DW_TAG_subprogram #0
DW_AT_name “f2”
DW_TAG_subprogram
DW_AT_name “c1”
DW_TAG_inlined_subroutine
DW_TAG_abstract_origin #0 “f2”
DW_TAG_compile_unit
DW_AT_name “inl2.cpp”
DW_TAG_subprogram #1
DW_AT_name “f2”
DW_TAG_subprogram
DW_AT_name “c2”
DW_TAG_inlined_subroutine
DW_TAG_abstract_origin #1 “f2”

Instead of something more like this:

DW_TAG_compile_unit
DW_AT_name “inl1.cpp”
DW_TAG_subprogram #0
DW_AT_name “f2”
DW_TAG_subprogram
DW_AT_name “c1”
DW_TAG_inlined_subroutine
DW_TAG_abstract_origin #0 “f2”
DW_TAG_compile_unit
DW_AT_name “inl2.cpp”
DW_TAG_subprogram
DW_AT_name “c2”
DW_TAG_inlined_subroutine
DW_TAG_abstract_origin #0 “f2”

(note that only one abstract definition of f2 is produced here)

Any thoughts? I imagine this is probably worth a reasonable amount of savings in an optimized build. Not huge, but not nothing. (probably not the top of anyone’s list though, I realize)

I should see the IR metadata in both cases to really know how it worked before, but it seems that to be able to merge the two subprogram definitions when linking, we’d need to be able to have a list of CU per subprogram instead of a single one, right?

Should we remove the CU link from a non-internal linkage subprogram (& this may have an effect on the GV representation issue originally being discussed) and just emit it into whichever CU happens to need it first?

Looks like this should work as well, but I don’t enough about the way we emit Dwarf…

Trying a simple prototype, I think the only thing that went really screwy, was once we don’t have that link, how do we pick which CU to put it in? Does it matter?

So my really rough prototype (that just used the first CU in the CUMap for all subprograms) of course ended up with this:

DW_TAG_compile_unit
DW_TAG_subprogram #0
DW_AT_name “f2”
DW_TAG_subprogram
DW_AT_name “c1”
DW_TAG_inlined_subroutine
DW_AT_abstract_origin #0
DW_TAG_subprogram
DW_AT_name “c2”
DW_TAG_inlined_subroutine
DW_AT_abstract_origin #0
DW_TAG_compile_unit

So I suppose we could use the unit when present - and make it present (& use distinct) when the function definition has external linkage. There’s nothing we intend to deduplicate there & that means external linkage definitions will go in the right CU, and inline definitions just go wherever their first use is.

Hmm - I guess that doesn’t quite work for ThinLTO? Oh, maybe it does - it’d mean we’d have to drop the CU reference and un-distinct when we import.

Hmm - that might simplify importing greatly - then we wouldn’t actually need to import CUs at all?

Don’t know if that’s a good thing.

  • Dave

Trying to wrap my brain around this, so a few questions below. =)

Sure thing - sorry, did assume a bit too much arcane context here.

Branching off from a discussion of improvements to DIGlobalVariable
representations that Adrian's working on - got me thinking about related
changes that have already been made to DISubprogram.

To reduce duplicate debug info when things like linkonce_odr functions
were deduplicated in LTO linking, the relationship between a CU and
DISubprogram was inverted (instead of a CU maintaining a list of
subprograms, subprograms specify which CU they come from - and the
llvm::Function references the DISubprogram, so if the llvm::Function goes
away, so does the associated DISubprogram)

I'm not sure if this caused a regression, but at least seems to miss a
possible improvement:

During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram
definitions (& their CU link, even if they weren't marked 'distinct', the
CU link would cause them to effectively be so) remain separate - this means
that inlined versions in one CU don't refer to an existing subprogram
definition in another CU.

To demonstrate:
inl.h:
void f1();
inline __attribute__((always_inline)) void f2() {
  f1();
}
inl1.cpp:
#include "inl.h"
void c1() {
  f2();
}
inl2.cpp:
#include "inl.h"
void c2() {
  f2();
}

Compile to IR, llvm-link the result. The DWARF you get is basically the
same as the DWARF you'd get without linking:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram #1
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #1 "f2"

Instead of something more like this:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"

(note that only one abstract definition of f2 is produced here)

I think I understand what you are saying. Essentially, having the SP->CU
link allows the SP to be deduplicated when multiple *outline* copies of the
corresponding function are deduplicated. But not when the multiple copies
are inlined, as it looks like we need all the copies, right?

Not quite - having the SP->CU link (well,h onestly, marking the SP as
"distinct" does this, but even if we didn't do that, the SP->CU link would
still do it) causes SPs /not/ to be deduplicated on IR linking.

Each SP is distinct/not considered duplicate with any other. (if we didn't
mark it 'distinct', the fact that each SP refers to its corresponding CU
would produce the same effect - they wouldn't be deduplicated because they
aren't identical - they refer to different CUs)

For non-inlined cases, this is fine.

Before we inverted the SP<>CU link, what would happen is that all copies
of the llvm::Function would be dropped, but their SPs would be left around.
So two CUs that both used the same linkonce_odr function (let's say no
inlining actually occurred though) would both have a SP description in the
DWARF - but one would actual have a proper definition (with a high/low PC,
etc) the other would be missing those features, as though the function had
been optimized away (which it sort of had)

So by reversing the link, we got rid of those extra SP descriptions in the
DWARF (and the extra SP descriptions in the metadata - I think they were
duplicate back then because they still had a scope chain leading back to
their CU (maybe we had gotten rid of that chain - if we had, then adding it
back in may've actually caused more metadata, but less DWARF))

I almost followed all of this, until I got to this last bit. I understood
from above that with the SP->CU link (and distinct SPs), prevented
deduplication. But this last bit sounds like we are in fact removing the
duplicates in the DWARF and possibly also in the metadata.

Any thoughts? I imagine this is probably worth a reasonable amount of
savings in an optimized build. Not huge, but not nothing. (probably not the
top of anyone's list though, I realize)

Should we remove the CU link from a non-internal linkage subprogram (&
this may have an effect on the GV representation issue originally being
discussed) and just emit it into whichever CU happens to need it first?

I can see how this would be done in LTO where the compiler has full
visibility. For ThinLTO presumably we would need to do some index-based
marking? Can we at least do something when we import an inlined SP and drop
it since we know it is defined elsewhere?

Complete visibility isn't required to benefit here - and unfortunately
there's nothing fancier (that I know of) that we can do to avoid emitting
one definition of each used inline function in each thinlto object file we
produce (we can't say "oh, the name of the function, its mangled name, the
names and types of its parameters are over in that other object
file/somewhere else" - but we can avoid emitting those descriptions in each
/CU/ that uses the inlined function within a single ThinLTO object)

I can provide some more thorough examples if that'd be helpful :slight_smile:

Ok, I think I understand. This is only emitting once per object file, which
with ThinLTO can contain multiple CUs due to importing. But then with full
LTO it sounds like we would be in even better shape, since it has a single
module with all the CUs?

Teresa

Beefed up the prototype to use a unit if it’s there, and manually modified some IR to drop the unit and distinct on the linkonce_odr functions in the example - linked them (did the right thin, deduplicated the subprograms) & codegen’d the result, got the right answer (like the original improved output I proposed)

Just an FYI/idea/thing.

dump.txt (3.87 KB)

optional_cu_distinct_subprograms.diff (4.59 KB)

inl1.ll (2.41 KB)

inl-link.ll (3.21 KB)

inl2.ll (2.41 KB)

Ah, right - I can see how it reads that way, sorry.

Old old way:

first.ll:
CU1 → {fn1_SP → @fn1, inl_SP → @inl, … }
@fn1
@inl
Resulting DWARF:
compile_unit CU1
subprogram fn1
high/low pc, etc
subprogram inl
high/low pc, etc

second.ll:
CU2 → {inl_SP2 → @inl, SP2 → @fn2, … }
@inl
@fn2
Resulting DWARF:
compile_unit CU2
subprogram inl
high/low pc, etc
subprogram fn2
high/low pc, etc

link first.ll + second.ll:
CU1 → {fn1_SP → @fn1, inl_SP → @inl, … }
CU2 → {inl_SP2 → null, SP2 → @fn2, … }@fn1
@inl
@fn2
Resulting DWARF:
compile_unit CU1
subprogram fn1
high/low pc, etc
subprogram inl
high/low pc, etc
compile_unit CU2
subprogram inl
name, but no high/low pc - this is unnecessary
subprogram fn2
high/low pc, etc

New way:CU1
@fn1 → fn1_SP → CU1
@inl → inl_SP → CU1
Resulting DWARF:
compile_unit CU1
subprogram fn1
high/low pc, etc
subprogram inl
high/low pc, etc

second.ll:
CU2
@inl → inl_SP → CU2
@fn2 → fn2_SP → CU2
Resulting DWARF:
compile_unit CU2
subprogram inl
high/low pc, etc
subprogram fn2
high/low pc, etc

link first.ll + second.ll (we pick @inl from first.ll in this example):
CU1
CU2@fn1 → fn1_SP → CU1
@inl → inl_SP → CU1
@fn2 → fn2_SP → CU2
Resulting DWARF:
compile_unit CU1
subprogram fn1
high/low pc, etc
subprogram inl
high/low pc, etc
compile_unit CU2
subprogram fn2
high/low pc, etc

So inverting the links causes us to completely drop the redundant description of ‘inl’ that appeared in C2 when the function was not inlined.

But if the function /is/ inlined, then the inlined location descriptions that remain in @fn2 (assuming there was a call to @inl in @fn1 and @fn2) still point to that original (CU2) version of @inl - causing it to to be emitted into CU2.

Whereas for type descriptions we don’t do this - the type has no CU link, so they all get deduplicated and even if @fn2 has a parameter of the same type as @fn1 - we emit the type into CU1 when we first encounter it (when emitting @fn1) and then reference it whenever we need it, even when emitting @fn2 in the other CU.

Right - currently we emit once per CU, but with a change in format we could emit once per object file - which hurts ThinLTO over non-LTO (because ThinLTO produces more CUs (due to imports) per object file) and is neutral for full LTO (since it produces the same number of CUs, just in one object file).

Improving this representation to produce once per object would help get ThinLTO back what it’s currently paying - and improve full LTO further than its current position in this regard.

But the gains might not be major/significant - I’ve done nothing to assess that, just observing that it is suboptimal.

Thanks for asking/helping me explain it further, hopefully this is more descriptive.

  • Dave

Trying to wrap my brain around this, so a few questions below. =)

Sure thing - sorry, did assume a bit too much arcane context here.

Branching off from a discussion of improvements to DIGlobalVariable
representations that Adrian's working on - got me thinking about related
changes that have already been made to DISubprogram.

To reduce duplicate debug info when things like linkonce_odr functions
were deduplicated in LTO linking, the relationship between a CU and
DISubprogram was inverted (instead of a CU maintaining a list of
subprograms, subprograms specify which CU they come from - and the
llvm::Function references the DISubprogram, so if the llvm::Function goes
away, so does the associated DISubprogram)

I'm not sure if this caused a regression, but at least seems to miss a
possible improvement:

During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram
definitions (& their CU link, even if they weren't marked 'distinct', the
CU link would cause them to effectively be so) remain separate - this means
that inlined versions in one CU don't refer to an existing subprogram
definition in another CU.

To demonstrate:
inl.h:
void f1();
inline __attribute__((always_inline)) void f2() {
  f1();
}
inl1.cpp:
#include "inl.h"
void c1() {
  f2();
}
inl2.cpp:
#include "inl.h"
void c2() {
  f2();
}

Compile to IR, llvm-link the result. The DWARF you get is basically the
same as the DWARF you'd get without linking:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram #1
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #1 "f2"

Instead of something more like this:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"

(note that only one abstract definition of f2 is produced here)

I think I understand what you are saying. Essentially, having the SP->CU
link allows the SP to be deduplicated when multiple *outline* copies of the
corresponding function are deduplicated. But not when the multiple copies
are inlined, as it looks like we need all the copies, right?

Not quite - having the SP->CU link (well,h onestly, marking the SP as
"distinct" does this, but even if we didn't do that, the SP->CU link would
still do it) causes SPs /not/ to be deduplicated on IR linking.

Each SP is distinct/not considered duplicate with any other. (if we didn't
mark it 'distinct', the fact that each SP refers to its corresponding CU
would produce the same effect - they wouldn't be deduplicated because they
aren't identical - they refer to different CUs)

For non-inlined cases, this is fine.

Before we inverted the SP<>CU link, what would happen is that all copies
of the llvm::Function would be dropped, but their SPs would be left around.
So two CUs that both used the same linkonce_odr function (let's say no
inlining actually occurred though) would both have a SP description in the
DWARF - but one would actual have a proper definition (with a high/low PC,
etc) the other would be missing those features, as though the function had
been optimized away (which it sort of had)

So by reversing the link, we got rid of those extra SP descriptions in the
DWARF (and the extra SP descriptions in the metadata - I think they were
duplicate back then because they still had a scope chain leading back to
their CU (maybe we had gotten rid of that chain - if we had, then adding it
back in may've actually caused more metadata, but less DWARF))

I almost followed all of this, until I got to this last bit. I understood
from above that with the SP->CU link (and distinct SPs), prevented
deduplication. But this last bit sounds like we are in fact removing the
duplicates in the DWARF and possibly also in the metadata.

Ah, right - I can see how it reads that way, sorry.

Old old way:

first.ll:
  CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
  @fn1 ...
  @inl ...
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc

second.ll:
  CU2 -> {inl_SP2 -> @inl, SP2 -> @fn2, ... }
  @inl ...
  @fn2 ...
  Resulting DWARF:
    compile_unit CU2
      subprogram inl
        high/low pc, etc
      subprogram fn2
        high/low pc, etc

link first.ll + second.ll:
  CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
  CU2 -> {inl_SP2 -> null, SP2 -> @fn2, ... }
  @fn1 ...
  @inl ...
  @fn2 ...
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc
    compile_unit CU2
      subprogram inl
        name, but no high/low pc - this is unnecessary
      subprogram fn2
        high/low pc, etc

New way:
  CU1
  @fn1 -> fn1_SP -> CU1
  @inl -> inl_SP -> CU1
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc

second.ll:
  CU2
  @inl -> inl_SP -> CU2
  @fn2 -> fn2_SP -> CU2
  Resulting DWARF:
    compile_unit CU2
      subprogram inl
        high/low pc, etc
      subprogram fn2
        high/low pc, etc

link first.ll + second.ll (we pick @inl from first.ll in this example):
  CU1
  CU2
  @fn1 -> fn1_SP -> CU1
  @inl -> inl_SP -> CU1
  @fn2 -> fn2_SP -> CU2
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc
    compile_unit CU2
      subprogram fn2
        high/low pc, etc

So inverting the links causes us to completely drop the redundant
description of 'inl' that appeared in C2 when the function was not inlined.

But if the function /is/ inlined, then the inlined location descriptions
that remain in @fn2 (assuming there was a call to @inl in @fn1 and @fn2)
still point to that original (CU2) version of @inl - causing it to to be
emitted into CU2.

Whereas for type descriptions we don't do this - the type has no CU link,
so they all get deduplicated and even if @fn2 has a parameter of the same
type as @fn1 - we emit the type into CU1 when we first encounter it (when
emitting @fn1) and then reference it whenever we need it, even when
emitting @fn2 in the other CU.

Ok, great, thanks for the detailed explanation! I was missing the
distinction between the location descriptions and the type descriptions.

Any thoughts? I imagine this is probably worth a reasonable amount of
savings in an optimized build. Not huge, but not nothing. (probably not the
top of anyone's list though, I realize)

Should we remove the CU link from a non-internal linkage subprogram (&
this may have an effect on the GV representation issue originally being
discussed) and just emit it into whichever CU happens to need it first?

I can see how this would be done in LTO where the compiler has full
visibility. For ThinLTO presumably we would need to do some index-based
marking? Can we at least do something when we import an inlined SP and drop
it since we know it is defined elsewhere?

Complete visibility isn't required to benefit here - and unfortunately
there's nothing fancier (that I know of) that we can do to avoid emitting
one definition of each used inline function in each thinlto object file we
produce (we can't say "oh, the name of the function, its mangled name, the
names and types of its parameters are over in that other object
file/somewhere else" - but we can avoid emitting those descriptions in each
/CU/ that uses the inlined function within a single ThinLTO object)

I can provide some more thorough examples if that'd be helpful :slight_smile:

Ok, I think I understand. This is only emitting once per object file,
which with ThinLTO can contain multiple CUs due to importing. But then with
full LTO it sounds like we would be in even better shape, since it has a
single module with all the CUs?

Right - currently we emit once per CU, but with a change in format we
could emit once per object file - which hurts ThinLTO over non-LTO (because
ThinLTO produces more CUs (due to imports) per object file) and is neutral
for full LTO (since it produces the same number of CUs, just in one object
file).

Improving this representation to produce once per object would help get
ThinLTO back what it's currently paying - and improve full LTO further than
its current position in this regard.

But the gains might not be major/significant - I've done nothing to assess
that, just observing that it is suboptimal.

Thanks for asking/helping me explain it further, hopefully this is more
descriptive.

Great, thanks for pointing out this opportunity! When I get a chance
probably tomorrow I will give your prototype patch a try and see how much
difference it makes for Chromium.

Teresa

Ah - the patch fixes the backend/LLVM to cope with this representational change (at least for the one tiny example) but doesn’t fix Clang to produce the new representation (I just hand hacked the IR in a simple example to see if it worked). I could work with you/work up a change that might do that too, if that’s handy/worth testing.

  • Dave

Trying to wrap my brain around this, so a few questions below. =)

Sure thing - sorry, did assume a bit too much arcane context here.

Branching off from a discussion of improvements to DIGlobalVariable
representations that Adrian's working on - got me thinking about related
changes that have already been made to DISubprogram.

To reduce duplicate debug info when things like linkonce_odr functions
were deduplicated in LTO linking, the relationship between a CU and
DISubprogram was inverted (instead of a CU maintaining a list of
subprograms, subprograms specify which CU they come from - and the
llvm::Function references the DISubprogram, so if the llvm::Function goes
away, so does the associated DISubprogram)

I'm not sure if this caused a regression, but at least seems to miss a
possible improvement:

During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram
definitions (& their CU link, even if they weren't marked 'distinct', the
CU link would cause them to effectively be so) remain separate - this means
that inlined versions in one CU don't refer to an existing subprogram
definition in another CU.

To demonstrate:
inl.h:
void f1();
inline __attribute__((always_inline)) void f2() {
  f1();
}
inl1.cpp:
#include "inl.h"
void c1() {
  f2();
}
inl2.cpp:
#include "inl.h"
void c2() {
  f2();
}

Compile to IR, llvm-link the result. The DWARF you get is basically the
same as the DWARF you'd get without linking:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram #1
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #1 "f2"

Instead of something more like this:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"

(note that only one abstract definition of f2 is produced here)

I think I understand what you are saying. Essentially, having the SP->CU
link allows the SP to be deduplicated when multiple *outline* copies of the
corresponding function are deduplicated. But not when the multiple copies
are inlined, as it looks like we need all the copies, right?

Not quite - having the SP->CU link (well,h onestly, marking the SP as
"distinct" does this, but even if we didn't do that, the SP->CU link would
still do it) causes SPs /not/ to be deduplicated on IR linking.

Each SP is distinct/not considered duplicate with any other. (if we
didn't mark it 'distinct', the fact that each SP refers to its
corresponding CU would produce the same effect - they wouldn't be
deduplicated because they aren't identical - they refer to different CUs)

For non-inlined cases, this is fine.

Before we inverted the SP<>CU link, what would happen is that all copies
of the llvm::Function would be dropped, but their SPs would be left around.
So two CUs that both used the same linkonce_odr function (let's say no
inlining actually occurred though) would both have a SP description in the
DWARF - but one would actual have a proper definition (with a high/low PC,
etc) the other would be missing those features, as though the function had
been optimized away (which it sort of had)

So by reversing the link, we got rid of those extra SP descriptions in
the DWARF (and the extra SP descriptions in the metadata - I think they
were duplicate back then because they still had a scope chain leading back
to their CU (maybe we had gotten rid of that chain - if we had, then adding
it back in may've actually caused more metadata, but less DWARF))

I almost followed all of this, until I got to this last bit. I understood
from above that with the SP->CU link (and distinct SPs), prevented
deduplication. But this last bit sounds like we are in fact removing the
duplicates in the DWARF and possibly also in the metadata.

Ah, right - I can see how it reads that way, sorry.

Old old way:

first.ll:
  CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
  @fn1 ...
  @inl ...
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc

second.ll:
  CU2 -> {inl_SP2 -> @inl, SP2 -> @fn2, ... }
  @inl ...
  @fn2 ...
  Resulting DWARF:
    compile_unit CU2
      subprogram inl
        high/low pc, etc
      subprogram fn2
        high/low pc, etc

link first.ll + second.ll:
  CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
  CU2 -> {inl_SP2 -> null, SP2 -> @fn2, ... }
  @fn1 ...
  @inl ...
  @fn2 ...
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc
    compile_unit CU2
      subprogram inl
        name, but no high/low pc - this is unnecessary
      subprogram fn2
        high/low pc, etc

New way:
  CU1
  @fn1 -> fn1_SP -> CU1
  @inl -> inl_SP -> CU1
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc

second.ll:
  CU2
  @inl -> inl_SP -> CU2
  @fn2 -> fn2_SP -> CU2
  Resulting DWARF:
    compile_unit CU2
      subprogram inl
        high/low pc, etc
      subprogram fn2
        high/low pc, etc

link first.ll + second.ll (we pick @inl from first.ll in this example):
  CU1
  CU2
  @fn1 -> fn1_SP -> CU1
  @inl -> inl_SP -> CU1
  @fn2 -> fn2_SP -> CU2
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc
    compile_unit CU2
      subprogram fn2
        high/low pc, etc

So inverting the links causes us to completely drop the redundant
description of 'inl' that appeared in C2 when the function was not inlined.

But if the function /is/ inlined, then the inlined location descriptions
that remain in @fn2 (assuming there was a call to @inl in @fn1 and @fn2)
still point to that original (CU2) version of @inl - causing it to to be
emitted into CU2.

Whereas for type descriptions we don't do this - the type has no CU link,
so they all get deduplicated and even if @fn2 has a parameter of the same
type as @fn1 - we emit the type into CU1 when we first encounter it (when
emitting @fn1) and then reference it whenever we need it, even when
emitting @fn2 in the other CU.

Ok, great, thanks for the detailed explanation! I was missing the
distinction between the location descriptions and the type descriptions.

Any thoughts? I imagine this is probably worth a reasonable amount of
savings in an optimized build. Not huge, but not nothing. (probably not the
top of anyone's list though, I realize)

Should we remove the CU link from a non-internal linkage subprogram (&
this may have an effect on the GV representation issue originally being
discussed) and just emit it into whichever CU happens to need it first?

I can see how this would be done in LTO where the compiler has full
visibility. For ThinLTO presumably we would need to do some index-based
marking? Can we at least do something when we import an inlined SP and drop
it since we know it is defined elsewhere?

Complete visibility isn't required to benefit here - and unfortunately
there's nothing fancier (that I know of) that we can do to avoid emitting
one definition of each used inline function in each thinlto object file we
produce (we can't say "oh, the name of the function, its mangled name, the
names and types of its parameters are over in that other object
file/somewhere else" - but we can avoid emitting those descriptions in each
/CU/ that uses the inlined function within a single ThinLTO object)

I can provide some more thorough examples if that'd be helpful :slight_smile:

Ok, I think I understand. This is only emitting once per object file,
which with ThinLTO can contain multiple CUs due to importing. But then with
full LTO it sounds like we would be in even better shape, since it has a
single module with all the CUs?

Right - currently we emit once per CU, but with a change in format we
could emit once per object file - which hurts ThinLTO over non-LTO (because
ThinLTO produces more CUs (due to imports) per object file) and is neutral
for full LTO (since it produces the same number of CUs, just in one object
file).

Improving this representation to produce once per object would help get
ThinLTO back what it's currently paying - and improve full LTO further than
its current position in this regard.

But the gains might not be major/significant - I've done nothing to
assess that, just observing that it is suboptimal.

Thanks for asking/helping me explain it further, hopefully this is more
descriptive.

Great, thanks for pointing out this opportunity! When I get a chance
probably tomorrow I will give your prototype patch a try and see how much
difference it makes for Chromium.

Ah - the patch fixes the backend/LLVM to cope with this representational
change (at least for the one tiny example) but doesn't fix Clang to produce
the new representation (I just hand hacked the IR in a simple example to
see if it worked). I could work with you/work up a change that might do
that too, if that's handy/worth testing.

Oh, right. Maybe I can get an idea of how much this would save. Is there a
decent heuristic to measure the redundancy?
Teresa

The Clang changes might not take much longer (O(minutes)) than the LLVM changes. But I wouldn’t be totally confident in any of it scaling to a big codebase - might be a bunch of small issues/cases I wouldn’t’ve handled. Might be worth spending a little (~half an hour, an hour) on all up, I guess.

I can’t think of a great heuristic for this, really. Measuring debug info size impact isn’t too easy, really, short of prototyping/experimenting. (at least at this level of “how many of this kind of thing do we produce/how big are they generally”)

A few disjoint thoughts; sorry they’re so delayed (I skimmed the responses below, and I think these are still relevant/not covered elsewhere).

Firstly, why should DISubprogram definitions be distinct? There were two reasons this was valuable (this was from before there was a cu: link).

  • It helped to fix long-standing bugs in the IRLinker, where uniqued-DISubprograms in different compile units would coalesce. Now that the Function <-> DISubprogram connection has been reversed, I’m not entirely sure it’s still a necessary precaution.

  • It broke a several uniquing cycles involving DISubprogram. The existence of the uniquing cycles meant that affected DISubprograms weren’t really uniqued, anyway (since we don’t bother to solve graph isomorphism); and breaking the cycles sped up the IRLinker and reduced memory waste. Making DISubprogram uniqued again would either have no real effect (because distinct nodes they reference aren’t uniqued anymore either) or bring back uniquing cycles (which would have effects… but not the desired one).

But there ought to be a way to get coalescing in the right places. We just need to ensure that the uniqued/coalesced parts of the graph:

  • are acyclic, and
  • reference only coalesced nodes (note that leaves can happily include the strangely-coalesced distinct DICompositeType beast I created).

I’ve been thinking idly whether we can/should break DISubprogram apart into uniqued (coalesce-able) and distinct parts. Still being someone ignorant of DWARF itself (and CodeView, for that matter), I’m not entirely sure. But…

Here’s my understanding of the current uses of DISubprogram (where I’m using DISubprogramDecl for the usually uniqued declarations, and DISubprogramDefn for the always distinct definitions):

// !dbg
Function → DISubprogram

// Declaration (optional, only for member functions).
DISubprogramDefn → DISubprogramDecl

// Variables:
DISubprogramDefn → MDNode → DILocalVariable

// Terminator for scope chains:
DILocalVariable [-> DILocalScope [-> …]] → DISubprogramDefn
DILocation [-> DILocalScope [-> …]] → DISubprogramDefn
DICompositeType → DISubprogramDefn // function-local type

// Elements (member functions):
DICompositeType → MDNode → DISubprogramDecl

DISubprogramDecl → DICompositeType

DISubprogramDefn → DICompositeType

And here are the changes I would consider:

  1. Split DISubprogramDecl and DISubprogramDefn into different types.

  2. Remove definition-specific fields from DISubprogramDecl.

  • variables, a “dangerous” source of uniquing cycles.
  • declaration, which it will never have.
  • scopeLine, which it will never have.
  • unit, which the Defn will have when it’s relevant.
  1. Make a DISubprogramDecl mandatory for DISubprogramDefn.

  2. Remove redundant fields from DISubprogramDefn.

  • name and linkageName are on the declaration.
  • type is redundant.
  • flags is likely redundant?

Nothing so far has made any real changes, I think. DISubprogramDecl should coalesce nicely between modules though.

  1. Migrate links to point at DISubprogramDecl instead of DISubprogramDefn.
  • DILocalVariable scope chain.
  • DILocation scope chain.
  • DICompositeType scope.

I don’t think this will create any uniquing cycles. DISubprogramDecl won’t point at any of these. Also, DILocalVariable should start coalescing with each other.

Since DISubprogramDefn is only referenced from the Function !dbg attachment, only one should survive function-coalescing.

One open question, since I don’t know the backend code, is: can you emit the “correct” DWARF after step (5)? I’m hoping you only need the Defn of the Function that has instructions referencing the DILocalVariable/DILocation. In which case, we suddenly get a lot of coalescing.

A few disjoint thoughts; sorry they’re so delayed (I skimmed the responses below, and I think these are still relevant/not covered elsewhere).

Firstly, why should DISubprogram definitions be distinct? There were two reasons this was valuable (this was from before there was a cu: link).

  • It helped to fix long-standing bugs in the IRLinker, where uniqued-DISubprograms in different compile units would coalesce.

Any idea why that ^ was a problem/bugs?

Now that the Function <-> DISubprogram connection has been reversed, I’m not entirely sure it’s still a necessary precaution.

  • It broke a several uniquing cycles involving DISubprogram. The existence of the uniquing cycles meant that affected DISubprograms weren’t really uniqued, anyway (since we don’t bother to solve graph isomorphism); and breaking the cycles sped up the IRLinker and reduced memory waste.

Ah, fair point - my trivial example didn’t have any local variables, so didn’t hit this problem. They do indeed form cycles and I haven’t tested, but seems totally reasonable/likely that that would break my simple example because cycles break uniquing and all that.

Making DISubprogram uniqued again would either have no real effect (because distinct nodes they reference aren’t uniqued anymore either) or bring back uniquing cycles (which would have effects… but not the desired one).

But there ought to be a way to get coalescing in the right places. We just need to ensure that the uniqued/coalesced parts of the graph:

  • are acyclic, and
  • reference only coalesced nodes (note that leaves can happily include the strangely-coalesced distinct DICompositeType beast I created).

Perhaps before I dive into your suggestion (which, it seems, is to separate out the definition but let the declaration/locals form cycles - that would at least be dropped because the definition would only be kept due to it being referenced from the llvm::Function), what about something simpler:

What if scope chains didn’t go all the way to the top (the subprogram) but stopped short of that?

Now, granted - this was part of the great assertion I added several years back that caught all kinds of silliness - mostly around inlined call sites not having debugloc and leading to scope chains that lead to distinct roots. But the alternative to having and maintaining an invariant - is just to make things true by construction. All debug info scope chains within a function with an associated DISubprogram have an implicit last element of that DISubprogram?

That would mean it’d be harder to catch the bugs I found - but I think most of them came from that specific inlining situation (I can go back and do some archaeology and see if there’s other exposure where we could sure up some defenses as well) which can be caught more directly (I forget if the inliner now catches it itself, or if we enhanced the debug info verifier to catch it after the inliner runs - if it’s the verifier, then that wouldn’t suffice if we made the change I’m suggesting - and we’d have to hook it up in the inliner itself)

Thoughts?

A few disjoint thoughts; sorry they’re so delayed (I skimmed the responses below, and I think these are still relevant/not covered elsewhere).

Firstly, why should DISubprogram definitions be distinct? There were two reasons this was valuable (this was from before there was a cu: link).

  • It helped to fix long-standing bugs in the IRLinker, where uniqued-DISubprograms in different compile units would coalesce.

Any idea why that ^ was a problem/bugs?

Now that the Function <-> DISubprogram connection has been reversed, I’m not entirely sure it’s still a necessary precaution.

  • It broke a several uniquing cycles involving DISubprogram. The existence of the uniquing cycles meant that affected DISubprograms weren’t really uniqued, anyway (since we don’t bother to solve graph isomorphism); and breaking the cycles sped up the IRLinker and reduced memory waste.

Ah, fair point - my trivial example didn’t have any local variables, so didn’t hit this problem. They do indeed form cycles and I haven’t tested, but seems totally reasonable/likely that that would break my simple example because cycles break uniquing and all that.

Making DISubprogram uniqued again would either have no real effect (because distinct nodes they reference aren’t uniqued anymore either) or bring back uniquing cycles (which would have effects… but not the desired one).

But there ought to be a way to get coalescing in the right places. We just need to ensure that the uniqued/coalesced parts of the graph:

  • are acyclic, and
  • reference only coalesced nodes (note that leaves can happily include the strangely-coalesced distinct DICompositeType beast I created).

Perhaps before I dive into your suggestion (which, it seems, is to separate out the definition but let the declaration/locals form cycles

No (new) cycles; in fact this breaks some.

  • the Function itself references the defn and the locals;
  • the locals and the defn reference (transitively) the decl; and
  • the decl doesn’t reference any of those.

The only cycle is the decl <-> composite type one that already exists (and that odr magic (mostly) fixes). Maybe that one could be removed somehow too, but it isn’t directly relevant to the problem you’re looking at.

(Note that this is somewhat inspired by Reid’s point in his recent debug info talk, that CodeView avoids type cycles by breaking types into two records.)

  • that would at least be dropped because the definition would only be kept due to it being referenced from the llvm::Function), what about something simpler:

What if scope chains didn’t go all the way to the top (the subprogram) but stopped short of that?

Now, granted - this was part of the great assertion I added several years back that caught all kinds of silliness - mostly around inlined call sites not having debugloc and leading to scope chains that lead to distinct roots. But the alternative to having and maintaining an invariant - is just to make things true by construction. All debug info scope chains within a function with an associated DISubprogram have an implicit last element of that DISubprogram?

That seems reasonable, I think, now that the function points directly at its subprogram.

I’m not sure it’s simpler to get right though. Dropping the scope runs the risk of coalescing things incorrectly. I imagine the backend relies on pointer distinctness of two local variables called “a” that are in (or are parameters of) unrelated subprograms. I think in some situations they could be coalesced incorrectly if they had no scope.

That would mean it’d be harder to catch the bugs I found

A shame if so, but if we can find another way to catch it or design it away, as you mention, it doesn’t much matter.

A few disjoint thoughts; sorry they’re so delayed (I skimmed the responses below, and I think these are still relevant/not covered elsewhere).

Firstly, why should DISubprogram definitions be distinct? There were two reasons this was valuable (this was from before there was a cu: link).

  • It helped to fix long-standing bugs in the IRLinker, where uniqued-DISubprograms in different compile units would coalesce.

Any idea why that ^ was a problem/bugs?

Now that the Function <-> DISubprogram connection has been reversed, I’m not entirely sure it’s still a necessary precaution.

  • It broke a several uniquing cycles involving DISubprogram. The existence of the uniquing cycles meant that affected DISubprograms weren’t really uniqued, anyway (since we don’t bother to solve graph isomorphism); and breaking the cycles sped up the IRLinker and reduced memory waste.

Ah, fair point - my trivial example didn’t have any local variables, so didn’t hit this problem. They do indeed form cycles and I haven’t tested, but seems totally reasonable/likely that that would break my simple example because cycles break uniquing and all that.

Making DISubprogram uniqued again would either have no real effect (because distinct nodes they reference aren’t uniqued anymore either) or bring back uniquing cycles (which would have effects… but not the desired one).

But there ought to be a way to get coalescing in the right places. We just need to ensure that the uniqued/coalesced parts of the graph:

  • are acyclic, and
  • reference only coalesced nodes (note that leaves can happily include the strangely-coalesced distinct DICompositeType beast I created).

Perhaps before I dive into your suggestion (which, it seems, is to separate out the definition but let the declaration/locals form cycles

No (new) cycles; in fact this breaks some.

  • the Function itself references the defn and the locals;
  • the locals and the defn reference (transitively) the decl; and
  • the decl doesn’t reference any of those.

The only cycle is the decl <-> composite type one that already exists (and that odr magic (mostly) fixes). Maybe that one could be removed somehow too, but it isn’t directly relevant to the problem you’re looking at.

Oh, yeah - I’ve an aside there too, maybe for another thread. (punchy summary: anonymous enums in headers used for constants are technically ODR violations (well, if they’re ever used in an ODR entity - like an inline function) - they’re not public types, so we don’t put the mangled name on them & don’t deduplicate them in DWARF type units - and probably similarly don’t deduplicate them in the IR… - not sure what the right answer is there)

(Note that this is somewhat inspired by Reid’s point in his recent debug info talk, that CodeView avoids type cycles by breaking types into two records.)

Ah, thanks, that helps with me picture it!

  • that would at least be dropped because the definition would only be kept due to it being referenced from the llvm::Function), what about something simpler:

What if scope chains didn’t go all the way to the top (the subprogram) but stopped short of that?

Now, granted - this was part of the great assertion I added several years back that caught all kinds of silliness - mostly around inlined call sites not having debugloc and leading to scope chains that lead to distinct roots. But the alternative to having and maintaining an invariant - is just to make things true by construction. All debug info scope chains within a function with an associated DISubprogram have an implicit last element of that DISubprogram?

That seems reasonable, I think, now that the function points directly at its subprogram.

I’m not sure it’s simpler to get right though. Dropping the scope runs the risk of coalescing things incorrectly. I imagine the backend relies on pointer distinctness of two local variables called “a” that are in (or are parameters of) unrelated subprograms. I think in some situations they could be coalesced incorrectly if they had no scope.

Just no top level scope - so it’d only happen if you had two variables with the same name in the same scope. It might break in debug-having inline functions inlined into non-debug-having functions (which then might in turn be inlined into a debug-having function) - would have to check.

Also, not sure if inlined situations might introduce circularity. Yep. The scope for inlinedAt DILocations point to the DISubprogram - so that’d create a cycle. Ah well.

(came across this while looking for other things)

This is still an issue:

$ cat a.cpp

#include “a.h”

void f3() { f2(); }

$ cat b.cpp

#include “a.h”

void f3();

void f1() { }

int main() {

f3();

f2();

}

$ cat a.h

void f1();

struct t1 { int x; };

inline attribute((always_inline)) void f2(t1 = t1()) { f1(); }
$ $ clang+±tot -fuse-ld=lld -flto a.cpp b.cpp -g && llvm-dwarfdump-tot a.out -debug-info | grep “DW_AT_name|DW_TAG_|DW_AT_type” | sed -e “s/^…//”

DW_TAG_compile_unit

DW_AT_name (“a.cpp”)

DW_TAG_subprogram

DW_AT_name (“f2”)

DW_TAG_formal_parameter

DW_AT_type (0x0000003e “t1”)

DW_TAG_structure_type

DW_AT_name (“t1”)

DW_TAG_member

DW_AT_name (“x”)

DW_AT_type (0x00000054 “int”)

DW_TAG_compile_unit

DW_AT_name (“b.cpp”)

DW_TAG_subprogram

DW_AT_name (“f2”)

DW_TAG_formal_parameter

DW_AT_type (0x000000000000003e “t1”)

The type (t1) gets deduplicated/shared across CUs (the DW_AT_type in b.cpp uses FORM_ref_addr) but the f2 subprogram is duplicated - though b.cpp’s use of it could use ref_addr, and would if the function hadn’t been inlined away.

eg: if we move ‘f2’ to be defined in a.cpp but attributed always_inline, so it does get cross-unit inlined due to LTO:

DW_TAG_compile_unit

DW_AT_name (“a.cpp”)

DW_TAG_subprogram

DW_AT_name (“f2”)

DW_TAG_formal_parameter

DW_AT_type (0x0000003e “t1”)

DW_TAG_structure_type

DW_AT_name (“t1”)

DW_TAG_member

DW_AT_name (“x”)

DW_AT_type (0x00000054 “int”)

DW_TAG_base_type

DW_AT_name (“int”)

DW_TAG_subprogram

DW_AT_name (“f3”)

DW_TAG_inlined_subroutine

DW_AT_abstract_origin (0x0000002a “_Z2f22t1”)

DW_TAG_formal_parameter

DW_AT_abstract_origin (0x00000036)

DW_TAG_compile_unit

DW_AT_name (“b.cpp”)

DW_TAG_subprogram

DW_AT_name (“f1”)

DW_TAG_subprogram

DW_AT_name (“main”)

DW_AT_type (0x0000000000000054 “int”)

DW_TAG_inlined_subroutine

DW_AT_abstract_origin (0x000000000000002a “_Z2f22t1”)

So now because the cross-CU inlining happened during LTO after module merging - the merging happens when the function’s not inlined, module merge picks one llvm::Function, dropping the other one and dropping the other DISubprogram - and then the inlining kicks in, referencing just that single DIsubprogram.

Not sure if it’s of interest to anyone/a priority.

(side note: Anyone interested in making ThinLTO home type definitions? That’d be great to reduce type duplication - would mean ThinLTO could turn off type units and/or that .o/.dwo/etc files would just generally be smaller anyway)

Sorry for derailing into your aside, but…

My recollection is that type units can end up being used for types that actually don’t take up a lot of space, and so the overhead of the type units ends up costing extra. I put a task on our internal tracker to look into this properly, but we wouldn’t mind if someone else had a run at it (if only to prove that my recollection is incorrect).

Thanks,

–paulr

who is on vacation and really shouldn’t be looking at these emails

Since this aside got a life of its own - changing the subject to reflect that.

My recollection is that type units can end up being used for types that actually don’t take up a lot of space, and so the overhead of the type units ends up costing extra. I put a task on our internal tracker to look into this properly, but we wouldn’t mind if someone else had a run at it (if only to prove that my recollection is incorrect).

Thanks,

–paulr

who is on vacation and really shouldn’t be looking at these emails

From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of Reid Kleckner via llvm-dev
Sent: Friday, October 15, 2021 3:26 PM
To: David Blaikie <dblaikie@gmail.com>
Cc: llvm-dev <llvm-dev@lists.llvm.org>
Subject: Re: [llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

Sorry for derailing into your aside, but…

(side note: Anyone interested in making ThinLTO home type definitions? That’d be great to reduce type duplication - would mean ThinLTO could turn off type units and/or that .o/.dwo/etc files would just generally be smaller anyway)

Hey, I think that is a great idea! ThinLTO tends to be used in release build configurations, which tend to have debug info enabled. The size of symbols in release build config matters a lot because it is typically archived for a long time.

Yeah, I had some vague idea (I forget whether I’ve implemented parts of it) to have the frontend make smart decisions about whether to put a linkage name on a type (used for deduplicating during IR linking, and also used as the key for DWARF type units) if it knew the type was being emitted only once (eg: if it had a strong vtable - no need to use a type unit).

That’d have some wins straight away for situations like strong vtables (or I guess we could do something similar with ctor homing - if the type only has one ctor and it has strong linkage, and explicit template specializations… maybe you can technically have more than one of those? Ah well) - and then if we taught ThinLTO to remove the linkage name from a type and “promote” it to a singular definition, stripping it out from other modules - it could benefit there too.

I mean, we’d probably just end up turning off type units entirely under ThinLTO because it’d be “perfect” (hmm, only if it’s whole program/you’re not deduplicating types with objects outside the ThinLTO scope… ) - so the “opting out of type units on a per-type basis” part of this probably isn’t necessary but a nice way to express things anyway, maybe.

COFF & MachO platforms have other ways to deduplicate types (PDBs & dsymutil), but less duplicate stuff always makes things faster.

Yep yep.

For ELF users already using type units, this optimization will only allow us to recover the overhead of type units, which I recall is significant. In our evaluation of the feature for Chrome, we found it increased the final binary size significantly: crbug.com/1031936#c4

Yep, the overhead is significant (hmm - you’d only see an increase in the final binary size if there were very few duplicates - perhaps after ctor homing that was the case that there was so little duplication that the advantages of deduplicating were outweighed by the overhead of type units)