Is clang genned debuginfo really that huge

Hi,
Slightly weekendy offtopic on debug info and stacktraces as afterthought on my fight with building clang on lowbie machine and how not to approach this area.

Slightly neg example: golang.
https://social.lansky.name/@hn100/106075246574771701

Sorry for offtop and have a nice weekend,
Pk

In a RelWithDebInfo (-g -O2) build of Clang, over 90% of the final executable size is DWARF information.

The percentage will be lower with a Debug build, but I don’t have an actual number for that.

You can look at the sizes of ELF sections whose names start with .debug_ to see for yourself.

–paulr

Hello,
Yea i mean i know its large but some manage to make it even larger in other projects like golang.

If that helps anyone id think of it as kind of db design kind of problem. My understanding is debug standard was being created more or less ad hoc and later as documentation of code. For example whole db being in major order split by compilation units.

As of on efficiency of storage id probably think of making it as much as possible shared in common sections, making the rest as compact as possible but minding access from debugger use cases and possibly topping it with some db style indices.

Id think pdb ms format could ne treated as gold standard. It seems to be ultra efficient both for storage and accesses from debugger.

Best regards,
Pawel Kunio

pt., 16.04.2021, 17:10 użytkownik <paul.robinson@sony.com> napisał:

Hello,
Yea i mean i know its large but some manage to make it even larger in other projects like golang.

If there are particular ways that golang is using debug info that are
problematic, that might be interesting for the golang project to
understand - I'm happy to help explain other ways to use DWARF if
there's some explanation of what makes golang's use of DWARF
particularly problematic/costly.

If that helps anyone id think of it as kind of db design kind of problem.

Perhaps, though there's a fair bit of data to store. The fact that it
compresses as well as it does (I think GCC now defaults to -gz which
uses zlib/zip compression on debug info sections - Clang hasn't made
that change yet, but does support the feature) suggests the debug info
encoding isn't as compact as it could be - but whether or not it's
worth changing the format to be more dense compared to relying on an
existing compression scheme to make those gains, I'm not sure.

My understanding is debug standard was being created more or less ad hoc and later as documentation of code.

I can't say I know the history here - but many things do get
standardized after some existing practice. Where does your impression
of that come from?

At least if that is the case, the debugging format (DWARF) has been
around in a fairly formal sense (the DWARF specification, available at
dwarfstd.org) and goes through ongoing improvement like other
standards.

For example whole db being in major order split by compilation units.

Not sure I understand this comment ^

As of on efficiency of storage id probably think of making it as much as possible shared in common sections,

DWARF type units (-fdebug-types-section) can be used to share type
descriptions between compilation units, for example. But also DWARF
does support directly referencing things across compilation units (for
instance LLVM's DWARF generated when using LTO uses direct cross-unit
references to share type references (this can be composed with DWARF
type units (in case there are non-LTO object files to be linked in as
well that might benefit from sharing type definitions), or not) and to
reference cross-module/CU inlining situations)

making the rest as compact as possible but minding access from debugger use cases and possibly topping it with some db style indices.

There's the existing gdb_index name lookup table, and also the DWARFv5
.debug_names lookup table (derived from work Apple's used for many
years as a DWARF extension).

Id think pdb ms format could ne treated as gold standard. It seems to be ultra efficient both for storage and accesses from debugger.

I'm not sure that's the case, though I haven't seen a detailed
comparison of the formats to judge by any means.

Best regards,
Pawel Kunio

pt., 16.04.2021, 17:10 użytkownik <paul.robinson@sony.com> napisał:

In a RelWithDebInfo (-g -O2) build of Clang, over 90% of the final executable size is DWARF information.

The percentage will be lower with a Debug build, but I don’t have an actual number for that.

You can look at the sizes of ELF sections whose names start with .debug_ to see for yourself.

--paulr

From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of pawel k. via llvm-dev
Sent: Friday, April 16, 2021 10:00 AM
To: llvm-dev <llvm-dev@lists.llvm.org>
Subject: [llvm-dev] Is clang genned debuginfo really that huge

Hi,

Slightly weekendy offtopic on debug info and stacktraces as afterthought on my fight with building clang on lowbie machine and how not to approach this area.

Slightly neg example: golang.

Hacker News 100: "My Go Executable Files Are Still Getting Larger (…" - Mastodon

The article doesn't seem to discuss debug info much, so far as I can see.