LTO Module splitting and metadata

Ladies and Gentlemen,

  I am seeking some clarity in dealing with module splitting and metadata/debug info.

I know the topic has been debated before, so I am trying to understate the current state of things and potential development momentum.

What am I trying to do:
  - I am using llvm::SplitModule to break up a large LTO created module prior to codegen.
    - The hope is to use parallel processing to address time and number of sections per module.
    - I cannot afford globalizing any local symbols (see for details).
    - Time/memory footprint is a big concern here.

What is the issue:
  - When I split an LTO module in presence of Dwarf data (full -g) I have duplicate symbol mess during linking on metadata objects (

Why it happens:
  - The llvm::SplitModule uses CloneModule with conditional function to select GlobalValue for each partition. But it does not do it for metadata - it seems to simply copy _all_ metadata to all partitions.

What I know:
  - ThinLTO had a similar issue, and was forced(?) to use FunctionImport/linkInModule for lazy picking of functions into a module (I am sure I am oversimplifying here...)

What I do not know (and seek guidance for)
  - How I best achieve my objective given the above description?

Can I reuse the ThinLTO mechanism without duplicating functionality, or do I need to augment CloneModule to discriminate on metadata copying?
Which would be more useful in the long run? Maybe there is already a better way to achieve what I need? Did I get the whole picture wrong?

Any input is very much appreciated.



Hi Sergei,

ThinLTO is a little different in how it behaves because we are doing
the metadata minimization for imported functions by delaying
materialization and linking in of the metadata. In your case, metadata
was materialized long before for the merged module.

There is a thread discussing metadata minimization for bugpoint that
seems like it might be related? See the thread starting at