[DWARF][dsymutil] Deduplication of types with incomplete typedefs

I was recently experimenting on how to reduce DWARF binary size on a project I work on. We heavily rely on the standard library and noticed that particularly with std::vector, dsymutil seemed to almost never deduplicate DW_TAG_class_type tags for it with the same instantiations across compilation units.

When playing around with dsymutil code, I noticed that this is because a few functions in std::vector (let’s say instantiated as std::vector<int>) reference typedefs defined inside of it. For instance rend returns a reverse_iterator which is a typedef. The compiler then generates the following DWARF info:

  • The DW_TAG_subprogram for rend, referencing the reverse_iterator typedef as the DW_AT_type attribute
  • The reverse_iterator DW_TAG_typedef, referencing the concrete reverse_iterator type. In the case of std::vector<int> this turns out to be std::__1::reverse_iterator<std::__1::__wrap_iter<int *> >
  • The DW_TAG_class_type for the concrete type above, except that it only produces a declaration (with DW_AT_declaration set to true and no child DIEs)

The logic in dsymutil actually will consider all such instantiations of std::vector<int> “incomplete” and never consider any them for ODR, unless in some compile unit we also happen to have DWARF code emitted for the instantiation of these typedef’d classes in which case it finally is considered complete and can be uniqued.

I did a few experiments by relaxing dsymutil to allow setting a class as canonical for ODR if its typedefs were incomplete. This resulted in significant improvements (-25% reduction) given the prevalence of usage of such types. My change was obviously not a fix since if a subsequent copy of the class did indeed reference a complete DIE tree for the typedef’d classes, we would still collapse all copies into the first incomplete version and they would never get a change to correctly reference the complete reference.

What’s the best solution here? I noticed that dsymutil will consider DIEs to be canonical for ODR in order of the compile units that it processes, so if a complete version of the type appears late in the input, it will never be considered as a way of removing the incomplete versions of the types in the initial compile units. Does adding a pass to ensure all such types are gathered make sense?

I’m not familiar with how dsymutil works, but IIUC: you want to de-duplicate the incomplete descriptions, but still allow a complete description to supersede the (deduplicated) incomplete description. I’m not surprised that you’re seeing a significant size win, this is probably common with STL containers.

I suspect (based only on hearing it talked about, not ever studying it) that the unique-ing mechanism doesn’t tolerate this kind of superseding. I do expect that you’d end up with one uniqued incomplete description and one uniqued complete description; is that bad? It’s not 100% optimal, but you’ll have done most of the size reduction; the question is whether there’s any functional issue in that state.

Adding a post-processing pass to unique the incomplete descriptions also sounds like it would work, without the consequence of two different descriptions, although intuitively adding another pass over all of .debug_info might be unpleasantly expensive.

A third option would be to fiddle with the unique-ing mechanism to allow it to recognize this case, but intuitively that would be the most complicated solution.

That’s the view from “outside,” hopefully someone more familiar with the “insides” will come along with a more informed opinion.

Thanks for the reply. The solutions you describe match my intuition as well when it comes to solving the general problem of optimally unique-ing types. But I guess I’m also curious about how to deal with the specific case of STL containers which is pretty impactful given how common they are.

I do expect that you’d end up with one uniqued incomplete description and one uniqued complete description; is that bad?

The problem is that we don’t unique incomplete descriptions, so you mostly end up with one uniqed complete description and multiple incomplete descriptions and it is rarely the case that a std::vector description will be complete because it requires a compile unit to instantiate all of the referenced typedefs at the same time. Generally at least the concrete type pointed by the iterator typedef will be instantiated but that is not true for reverse_iterator and const_iterator and results in multiple std::vector instantiations for the same type not ever being uniqued

dsymutil emits code for compile units during the single pass:

for (file : object files ) {
  for (unit : compile units ) {
    analyze(unit);
    clone(unit);   <<< references to the types are processed here
    emit(unit)
  }
  clean cloned file data
}

To be able to use complete version of the type it would be necessary to postpone cloning&emitting after all units processed and complete types are discovered:

for (file : object files ) {
  for (unit : compile units ) {
    analyze(unit);   <<< discover incomplete types
  }
}

for (file : object files ) {
  for (unit : compile units ) {
    clone(unit);   <<< assign references to the types
  }
}

for (file : object files ) {
  for (unit : compile units ) {
    emit(unit)
  }
}

This would avoid parallel execution and lead to performance decrease(Currently dsymutil is able to do analyze and clone stages for different compile units parallelly).
Also it would be necessary to keep all cloned data for all files(Currently they are freed for each file). This would lead to memory usage increase.

There is an alternative proposal on type deduplication: remove types from compile units and put them into the artificial type unit. This would allow to still have single pass for compile units, to process units parallelly, to keep single copy of each type, to release resources as soon as compile unit processed:

a bit outdated prototype implementation ⚙ D96035 [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.
current patch on review ⚙ D147952 [DWARFLinkerParallel] Add interface files, create a skeleton implementation.

This proposal handles compile units separately in parallel.
It removes all type info from compile units and put it to the separate artificial type unit.
So, the following scheme is possible:

parallel_for_each(file : object files) {
  parallel_for_each(unit : compile units) {
    analyze(unit);
    clone(unit);   <<< types removed at this stage.
    emit(unit)
  }
}

generate artificial type unit.  <<< Artificial type unit contains a single copy for each type.

for (file : object files ) {
  for (unit : compile units ) {
     patch type references.
  }
}

This approach makes it possible to improve performance and have smaller debug info size because all incomplete types are removed.

Yeah, that makes sense. I’m thinking that the type deduplication logic would still need work to solve the case I’m mentioning for STL types right? In that even with the final pass, with the current logic we err on the side of caution by not trying to deduplicate incomplete types and these STL containers might never have a “complete” DIE tree anywhere in the debug information.

That made me wonder if we should have a “declaration” vs “incomplete” (vs complete which is when both flags are off) tags for types in the deduplication logic and allow the canonical ODR reference to be “upgraded” from “incomplete” → complete so that we can ODR to the most complete version found thus far. I’ll see if I can get around to producing a patch for this to make the idea clearer

I didn’t see this post until just now. Alexey did a great job of explaining why what you’re proposing isn’t possible within the limitations of the current approach.

I’m not sure I understand this part. You wouldn’t be able to “upgrade” the incomplete type until you’ve seen and processed the final unit, at which point you’ve already emitted all previous ones. Unless this was meant for the alternative approach that merges all the type info?

I was specifically trying to address unique-ing of incomplete types. Right now, this is never done and if the type unique-ing logic is kept intact it still wouldn’t be done in the alternative approach. There is at least one case I could find of types that are incomplete almost everywhere, namely STL containers because they contain typedefs which reference types that are never instantiated, even in reasonable circumstances e.g. when a program never tries to instantiate the class referred by a vector’s reverse_iterator typedef.

What I was proposing was to still allow the unique-ing of incomplete types, but then allow to “upgrade” the unique reference to the complete type when it appears in the input. I’m not 100% sure this works since incomplete types could differ on why they are incomplete and ultimately miss on trying to keep certain references. I’ll think about it some more.

What I was proposing was to still allow the unique-ing of incomplete types, but then allow to “upgrade” the unique reference to the complete type when it appears in the input. I’m not 100% sure this works since incomplete types could differ on why they are incomplete and ultimately miss on trying to keep certain references. I’ll think about it some more.

Right. The incomplete types could differ. This, probably, makes unique-ing of incomplete types being not very effective if done in one pass. My understanding of your suggestion is following:

CU1 {
   IncompleteType1;
   
   DW_TAG_typedef
     DW_AT_type CU1::IncompleteType1
}

CU2 {
   IncompleteType1;
   
   DW_TAG_typedef
     DW_AT_type CU2::IncompleteType1   
}

When the CU2 is cloned we see that CU2::IncompleteType1 is fully matched with the CU1::IncompleteType1 and then it might be replaced with reference to CU1::IncompleteType1.

It is usual to incomplete types to be different. And then, there would not be many chances to substitute them with reference to each other:

CU1 {
  DW_TAG_class_type {
      DW_TAG_name "vector"
      DW_AT_declaration (true)
      
      DW_TAG_subprogram {
          DW_TAG_name "push_back"
      }
  }
  
 DW_TAG_typedef
   DW_AT_Name "vector"
   DW_AT_type CU1::vector
}

CU2 {
  DW_TAG_class_type {
      DW_TAG_name "vector"
      DW_AT_declaration (true)
      
      DW_TAG_subprogram {
          DW_TAG_name "empty"
      }
  }
  
 DW_TAG_typedef
   DW_AT_Name "vector"
   DW_AT_type CU2::vector  
}

In this example both incomplete types are different and then could not be uniqued.

D96035 resolves this problem by removing original type descriptors and build united one:

TYPE_CU {
  DW_TAG_class_type {
      DW_TAG_name "vector"
      DW_AT_declaration (true)
      
      DW_TAG_subprogram {
          DW_TAG_name "push_back"
      }
      DW_TAG_subprogram {
          DW_TAG_name "empty"
      }      
  }
}

CU1 {
 DW_TAG_typedef
   DW_AT_Name "vector"
   DW_AT_type TYPE_CU::vector
}

CU2 { 
 DW_TAG_typedef
   DW_AT_Name "vector"
   DW_AT_type TYPE_CU::vector  
}

Ah, that’s great! This would definitely address my main issue currently, nice to hear there is a path to solving this already. Is there a way for me to track the status of the proposal? Let me also know if there is something I can do to help move it forward.

Ah, that’s great! This would definitely address my main issue currently, nice to hear there is a path to solving this already. Is there a way for me to track the status of the proposal?

Please use D96035 to track the progress. I put reference to D96035 into the related reviews.
Thus, the list of current openned/closed reviews is in the D96035.

At current stage the D147952 is under review.
Next step is implementing complete handling for --no-odr case for DWARFLinkerParallel.
After that, -odr case(types deduplication) would be implemented.

Let me also know if there is something I can do to help move it forward.

Thank you. Taking part in code review may help to move things faster :slight_smile:

FTR, this is the approach that the Sony debugger uses when loading debug info across many CUs. As new instances appear, any new information is added to the internal representation of the type. Seems like a good idea for dsymutil to do the same.

I believe dsymutil is single pass in this sense. It can’t go back to add things to a previous type - it can only emit/start a new one. So when it sees a new instance of a type with some variations (newly defined members, etc) - it has to emit a new copy of the type, I guess?

dsymutil could do something more like what clang does when it’s emitting members for a type that it’s not emitting the definition for - emit a declaration with the extra members, but witohut needing to duplicate all the rest of the type. (unfortunate that there’s no way to emit a reference from a declaration back to the existing definition, though - but if you’re going to end up with duplicated/partial types either way, at least the later duplicates could be smaller)