[DWARF][DWP] 4GB limit

Looking at DWARF4 GNU spec, DWARF5 spec, and llvm-dwp implementation looks like there is a 4GB limit for DWP due to offsets being 32bit.
Compressing doesn’t work, because llvm-dwp utility just uncompresses sections.

Is there any way around it? Can we make it 64bit under option (doubt it’s possible or good idea), or have compressed sections as part of the package?

@dblaikie Any thoughts on this?

1 Like

Yes, dwarf32 has a 32 bit limit. There is a dwarf64 (-gdwarf64 encoding that is at least partially supported in llvm (llvm-dwp doesn’t fully support it - in part because the dwp index spec has a hug where it’s missing the 64 but encoding mode - I’ve filed a dwarf issue about that recently ( DWARF Issue )) - though be aware that switching to dwarf64 does significantly increase debug info size due to the larger offset encoding

Dwarf compression (-gz) doesn’t help because the offsets are into the uncompressed data, so they still overflow.

There’s one dwarf6 proposal to allow 64 but encoding for debug_str_offsets without using dwarf 64 for everything, if only debug_str is too large, but hasn’t been implemented yet.

For the debug_str section I have at least one particular situation that’s exceeding the 32 bit limit - significant use of c++ expression templates from eigen and tensorfloq. To address this issue I am working on “simple template names” where template parameter lists are not included in the name of a template - if the parameter list string can be reconstructed from the dwarf dies describing the parameters. The clang/llvm work is done so far as I know (-gsimple-template-names) but lldb can’t quite handle it, so I’m working on fixes there. I believe gdb can handle it already but haven’t done extensive testing.

What parts of the dwarf are exceeding 32 bit size in your case? (Which debug sections)

Sorry I wasn’t cleare. Part that overflows are the CU index offsets for .debug_info.
uint32_t &InfoSectionOffset =
ContributionOffsets[getContributionIndex(DW_SECT_INFO, IndexVersion)];

InfoSectionOffset += C.Length;

Exactly offsets are into none compressed sections so currently it doesn’t work.

Maybe I am missing something, but there is DWARF64 and there is a packaged DWP format. There is no reason why the latter can be 64bit while DWARF is 32bit.
So
“The table of offsets begins immediately following the parallel table (at offset 16 + 12 * M from the beginning of the section). The table is a two-dimensional array of 32-bit words (using the byte order of the application binary), with L columns and N+1 rows, in row-major order. Each row in the array is indexed starting from 0. The first row provides a key to the remaining rows: each column in this row provides an identifier for a debug section, and the offsets in the same column of subsequent rows refer to that section. The section identifiers are”

Can bet “two-dimensional array of 64-bit words”. Although with DWARF5 spec ship has sailed, so will have to wait until DWARF6.

Basically issue I am trying to solve is that we switched to split dwarf to avoid the 4GB relocation limit, but DWP puts it back on the table in different flavor (debug info, vs binary size).

Oh, that’s fair. I hadn’t considered that. DWARF64 was generally “all or nothing” (all parts of a unit must be encoded with a consistent format - DWARF32 or DWARF64, not a mix) in the past - but with Split DWARF especially, since there’s no references across units and the only shared content is the string section - so it can make sense to use DWARF64-like encoding for the index, and optionally for the str_offsets section (as @pogo59 has already proposed for DWARFv6) sounds like it could cover most uses of DWARF64. Most uses wouldn’t need a single unit that exceeds the 32 bit limit, but only exceeds it when linked together and includes absolute references (maybe even without Split DWARF we could workaround that by having non-absolute cross-unit references (eg: relative cross-unit references)).

It’s curious to me you hit that limit before the string offset limit - admittedly Google hit the string offset limit due to heavy use of expression templates, due to Eigen and TensorFlow which isn’t representative of average/other code.

Is there anything in particular about the DWARF content that’s especially responsible for hitting these limits? Are you using DWARF type units (-fdebug-types-section)? (it helps reduce final file size by deduplicating type information) Which section(s) are you finding exceed the 32 bit limit? (I assume .debug_info?)

Yeah we are using -fdebug-types-section, although my understanding is that it’s not very useful for split dwarf.
We are exceeding the offsets in .debug_info once we start to package all of it into dwp.

Actually if you don’t me asking. How does google deal with collecting profiles that rely on debug information, from binaries build with split dwarf? Do you also use dwp packages to have debug info along with binary when it is deployed?

Hmm, there’s nothing I can think of that makes type units less relevant for Split DWARF, /if/ you’re making a package file. If you’re not making a package file, then type units add overhead without benefit (because you never merge the debug info together, so the redundancy elimination doesn’t happen & you pay the overhead of using type units without the payoff) - that option isn’t available in non-split-DWARF since in that case you always link the debug info.

Any idea what the debug info is that’s causing such extreme growth? (whether it’s down to some particular idioms/libraries, like I discovered for our usage (expression templates in Eigen and TensorFlow), or just spread evenly across totally unrelated code/libraries?)

For anything in production that’s built with Split DWARF we do make dwps, but either the executable (in non-split) or the executable + dwp (in split) is processed/ingested by a database-y thing where all the debug info is referenced. The stripped binaries go to the production systems and only addresses come back from sample profiles - then get symbolized/processed with the debug info previously ingested.

Ah I see. Sounds like our approaches are similar.

TBH I don’t know. I haven’t had a chance to dive into what contributes most debug info. Been busy with BOLT and other issues. :slight_smile:

Ah, FWIW one thing I did when we started to see this in Google was dump and sort the .debug_str.dwo, strip out template parameters and sort+count to see which names show up most often.

1 Like

@dblaikie Regarding 64bit cu-index for 32bit DWARF.
Where do I need to post to bring it to attention for folks responsible for DWARF6 spec

I’ve already filed this with in the appropriate place, here: DWARF Issue
(you can file other DWARF issues here: DWARF Standard Public Comment if you need to - you can send emails to dwarf-discuss: Dwarf-Discuss Info Page if you want to have general discussions with DWARF-related folks)

1 Like

Thanks!
One thing I was thinking about is modifying DWARF Library to detect corrupted cu-index and building it’s own map of dwoid to debug info start/length. For example when tools are going through getNonSkeletonUnitDIE API.

Right now it will return a dwo CU that has closes valid offset.
It think llvm-profgen and gsym have a similar logic.

Sorry I’m not quite following what you said.

I think I can picture how you might detect a corrupted index (one index entry that points inside another, maybe - in the .debug_info section, at least - that wouldn’t work for other sections (well, might work, but expect some other sections to have complete overlap (eg: several type units might share a .debug_line.dwo contribution range))) & then compute new ranges based on walking the unit headers.

Only good for the .debug_info section - still have to rely on the index to know which other sections are associated with that unit hash.

I’d probably rather prototype an experimental 64 bit index format that’d hopefully look something like what would get standardized.

Add a unit header to the unit index (so we can encode the DWARF32/DWARF64 state) and move it to a new section (.debug_llvm_index, perhaps?) - though it does mean more churn for all the producers/consumers when we eventually do standardize something & they have to check another section to find the index.

Presumably do the same thing we’re doing for the .debug_str_offsets section in DWARFv6, and allow the index to have a different DWARF32/64 state than the rest of the debug info (so you can link a bunch of DWARF32 together, but upgrade the index to DWARF64 if it needs to use larger offsets).

Sorry I wasn’t clear. Yes it’s to deal with .debug_info section specifically. For all others it will still have to rely on the cu-index.
I was thinking of it as the least intrusive bridge until DWARF6 spec is finalized and formally rolled out. My worry about .debug_llvm_index is that once something is added it’s hard to remove the longer it stays in and becomes “standard”. On the other hand it is the move towards what the next standard will say, hopefully.

I realize it doesn’t solve potential issue with .debug_types. That being said since llvm/gcc moved to DWARF5 as the default this might not be an issue at all. Since .debug_types/.debug_types.dwo goes away in DWARF5 as type units are now part of .debug_info/.debug_info.dwo. Which actually will make issue of .debug_info overflowing 4GB cu-index limit even more prevalent. Unless I missed something.

I was thinking of it as the least intrusive bridge until DWARF6 spec is finalized and formally rolled out. My worry about .debug_llvm_index is that once something is added it’s hard to remove the longer it stays in and becomes “standard”. On the other hand it is the move towards what the next standard will say, hopefully.

Yeah, those are certainly the tradeoffs. Though the workaround would have to go into all dwp-consuming tools, including debuggers. I’m not sure if it’ll be easier to convince debuggers to accept that workaround than to convince them to support a non-standard extension - but I don’t really know. If it’s only llvm ecosystem tools you need to support, maybe that’s fine - lldb, llvm-symbolizer, etc, are probably easier to convince than gdb, etc.

so the workaround would be, when doing an index lookup, check if the debug_info range starts with a unit header that includes the expected dwo id - and if it doesn’t, go searching through the units (maybe only the units beyond the 32 bit int max - a quick scan of the index would find the last entry that starts before that point (if it’s safe to assume there aren’t massive overflowed values in there - if the total size is less than 8GB at least - if it’s bigger than that, then it might’ve fully wrapped around - in which case you’d have to scan the units in .debug_info from the start of the section)) for one with a matching header?

A more robust but slower option would be to assume all the offsets are broken - and rebuild them from scratch by scanning .debug_info. Or a less robust option (for dwp < 8GB) would be to assume that any invalid offset (one that doesn’t point to a header with the correct dwo id) is overflow - and add 32 bit int max to the offset and check again, expecting to arrive at the right offset that way?

I realize it doesn’t solve potential issue with .debug_types. That being said since llvm/gcc moved to DWARF5 as the default this might not be an issue at all. Since .debug_types/.debug_types.dwo goes away in DWARF5 as type units are now part of .debug_info/.debug_info.dwo. Which actually will make issue of .debug_info overflowing 4GB cu-index limit even more prevalent. Unless I missed something.

Yep.

Right it’s only for llvm utilities. Not sure if .debug_llvm_index is a starter for other tools either. Although pure speculation on my part.

I was thinking more along the lines of robust, but slower one. During DWO CU lookup if we encounter an offset coming from cu-index that doesn’t point to start of CU, we scan through .debug_info and construct a map of <dwoid, <start_offset (64bit), length>>. Then for this and subsequent lookups use the map.

That or scan through cu detect overlapping ranges. If they exist construct the map from the beginning.

Although just adding UINT_MAX to an offset that doesn’t point to start of CU is probably good enough. We can still get it wrong if overflow just happens to hit correct CU. For 8GB I wonder if we just won’t be able to link binary by that point due to other sections growing too large and hitting relocation overflows.

Oh, I expect it’s about the same either way - which is sort of why I suggested debug_llvm_index, because we’re in about the same place in terms of implementing it and working around things.

But yeah, probably the simplest thing to do is test if .debug_info is larger than int32 max, and if it is, ignore the .debug_info column and do a manual scan through the section. I’m OK with something like that going into libDebugInfoDWARF.

We’d be able to remove that workaround when we ready a DWARFv6 versioned index - or turn it into a hard failure (oh, you have a bigger-than-32 bit section, but a 32 bit index? No good)

1 Like

Great! I’ll add it to my priority queue. :slight_smile: Might not be able to get to it until October after my PTO.

I checked LLDB’s code and we end up using the LLVM DWARF parser to parse the CUIndex. It would be great if we can modify DWARFContext::getCUIndex() to check if .debug_info is over 4GB, and if so, just manually scan all of the .debug_info and manually parse and populate the CU index and allow for 64 bit offsets. This means we would need to parse all CU headers from the .debug_info and save off the 64 bit offset. This means we would need to modify DWARFUnitIndex::Entry::SectionContribution::Offset to be 64 bit and then deal with code that expects this to be 32 bit. I think there is some encoding stuff that encodes and decodes this value that is using the type of DWARFUnitIndex::Entry::SectionContribution::Offset as how to decode it, but we would need those points to still encode/decode as 32 bit.

The suggested approach would be:

const DWARFUnitIndex &DWARFContext::getCUIndex() {
  if (CUIndex)
    return *CUIndex;

  CUIndex = std::make_unique<DWARFUnitIndex>(DW_SECT_INFO);
  if (debug_info.size() > 4GB) {
    // Manually extract the CUIndex from the actual DWARF in .debug_info by grabbing the DWO
    // ID from the Unit header for DWARF5 or from the CU DIE in the DW_AT_dwo_id attribute
    // and allow for CU offsets over 4GB...
  } else {
    // Extract from section as it is valid
    DataExtractor CUIndexData(DObj->getCUIndexSection(), isLittleEndian(), 0);
    CUIndex->parse(CUIndexData);
  return *CUIndex;
}

You’d still need to combine data from the index (since it tells you which parts of other sections (eg: .debug_str_offsets.dwo) goes with a given CU/TU) - so either way you need to parse the index, but then overwrite the DW_SECT_INFO column with the data produced by parsing the .debug_info.dwo section.

Pretty ugly, but I wouldn’t be totally averse to it.