RFC: Using zlib to decompress debug info sections.

Hi!

TL;DR WDYT of adding zlib decompression capabilities to LLVMObject library?

ld.gold from GNU binutils has --compress-debug-sections=zlib option,
which uses zlib to compress .debug_xxx sections and renames them to .zdebug_xxx.
binutils (and GDB) support this properly, while LLVM command line tools don’t:

$ ld --version
GNU gold (GNU Binutils for Ubuntu 2.22) 1.11

$ ./bin/clang++ -g a.cc -Wl,–compress-debug-sections=zlib

$ objdump -h a.out | grep debug
26 .debug_info 00000066 0000000000000000 0000000000000000 00002010 20
27 .debug_abbrev 00000048 0000000000000000 0000000000000000 00002068 2
0
28 .debug_aranges 00000000 0000000000000000 0000000000000000 000020bb 20
29 .debug_macinfo 00000000 0000000000000000 0000000000000000 000020cf 2
0
30 .debug_line 00000053 0000000000000000 0000000000000000 000020e3 20
31 .debug_loc 00000000 0000000000000000 0000000000000000 0000213e 2
0
32 .debug_pubtypes 00000000 0000000000000000 0000000000000000 00002152 20
33 .debug_str 00000069 0000000000000000 0000000000000000 00002166 2
0
34 .debug_ranges 00000000 0000000000000000 0000000000000000 000021d9 2**0

$ ./bin/llvm-objdump -h a.out | grep debug
27 .zdebug_info 00000058 0000000000000000
28 .zdebug_abbrev 00000053 0000000000000000
29 .zdebug_aranges 00000014 0000000000000000
30 .zdebug_macinfo 00000014 0000000000000000
31 .zdebug_line 0000005b 0000000000000000
32 .zdebug_loc 00000014 0000000000000000
33 .zdebug_pubtypes 00000014 0000000000000000
34 .zdebug_str 00000073 0000000000000000
35 .zdebug_ranges 00000014 0000000000000000

Decompression and proper handling of debug info sections may be needed
in llvm-dwarfdump and llvm-symbolizer tools. We can implement this by:

  1. Checking if zlib is present in the system during configuration.
  2. Adding zlib decompression to llvm::MemoryBuffer, and section decompression to LLVMObject (this would require optional linking with -lz).
  3. Using the methods in LLVM tools where needed.

Does this make sense to you?

Hi!

TL;DR WDYT of adding zlib decompression capabilities to LLVMObject library?

Yes, I want this.

ld.gold from GNU binutils has --compress-debug-sections=zlib option,
which uses zlib to compress .debug_xxx sections and renames them to
.zdebug_xxx.
binutils (and GDB) support this properly, while LLVM command line tools
don't:

$ ld --version
GNU gold (GNU Binutils for Ubuntu 2.22) 1.11
$ ./bin/clang++ -g a.cc -Wl,--compress-debug-sections=zlib
$ objdump -h a.out | grep debug
  26 .debug_info 00000066 0000000000000000 0000000000000000
  00002010 2**0
  27 .debug_abbrev 00000048 0000000000000000 0000000000000000
  00002068 2**0
  28 .debug_aranges 00000000 0000000000000000 0000000000000000
  000020bb 2**0
  29 .debug_macinfo 00000000 0000000000000000 0000000000000000
  000020cf 2**0
  30 .debug_line 00000053 0000000000000000 0000000000000000
  000020e3 2**0
  31 .debug_loc 00000000 0000000000000000 0000000000000000
  0000213e 2**0
  32 .debug_pubtypes 00000000 0000000000000000 0000000000000000
  00002152 2**0
  33 .debug_str 00000069 0000000000000000 0000000000000000
  00002166 2**0
  34 .debug_ranges 00000000 0000000000000000 0000000000000000
  000021d9 2**0
$ ./bin/llvm-objdump -h a.out | grep debug
  27 .zdebug_info 00000058 0000000000000000
  28 .zdebug_abbrev 00000053 0000000000000000
  29 .zdebug_aranges 00000014 0000000000000000
  30 .zdebug_macinfo 00000014 0000000000000000
  31 .zdebug_line 0000005b 0000000000000000
  32 .zdebug_loc 00000014 0000000000000000
  33 .zdebug_pubtypes 00000014 0000000000000000
  34 .zdebug_str 00000073 0000000000000000
  35 .zdebug_ranges 00000014 0000000000000000

Decompression and proper handling of debug info sections may be needed
in llvm-dwarfdump and llvm-symbolizer tools. We can implement this by:
1) Checking if zlib is present in the system during configuration.
2) Adding zlib decompression to llvm::MemoryBuffer, and section
decompression to LLVMObject (this would require optional linking with -lz).
3) Using the methods in LLVM tools where needed.

Does this make sense to you?

Yes, exactly. I'm not certain that MemoryBuffer and LLVMObject are the right places, but it doesn't sound wrong.

Nick

I'm not sure MemoryBuffer is the right place to do this either. I'm also
not sure if we want debug info decompression to be transparent in
LLVMObject or not. I'm leaning towards no since it's not part of the
standard yet, unless gold is actually using the SHF_COMPRESSED flag.

I think it should be part of Object, but as an external API that is used
when you find a section you know from external factors (the name matches
some list) is compressed.

- Michael Spencer

Hi!

TL;DR WDYT of adding zlib decompression capabilities to LLVMObject
library?

ld.gold from GNU binutils has --compress-debug-sections=zlib option,
which uses zlib to compress .debug_xxx sections and renames them to
.zdebug_xxx.
binutils (and GDB) support this properly, while LLVM command line tools
don't:

$ ld --version
GNU gold (GNU Binutils for Ubuntu 2.22) 1.11
$ ./bin/clang++ -g a.cc -Wl,--compress-debug-sections=zlib
$ objdump -h a.out | grep debug
26 .debug_info 00000066 0000000000000000 0000000000000000 00002010
2**0
27 .debug_abbrev 00000048 0000000000000000 0000000000000000 00002068
2**0
28 .debug_aranges 00000000 0000000000000000 0000000000000000 000020bb
2**0
29 .debug_macinfo 00000000 0000000000000000 0000000000000000 000020cf
2**0
30 .debug_line 00000053 0000000000000000 0000000000000000 000020e3
2**0
31 .debug_loc 00000000 0000000000000000 0000000000000000 0000213e
2**0
32 .debug_pubtypes 00000000 0000000000000000 0000000000000000
00002152 2**0
33 .debug_str 00000069 0000000000000000 0000000000000000 00002166
2**0
34 .debug_ranges 00000000 0000000000000000 0000000000000000 000021d9
2**0
$ ./bin/llvm-objdump -h a.out | grep debug
27 .zdebug_info 00000058 0000000000000000
28 .zdebug_abbrev 00000053 0000000000000000
29 .zdebug_aranges 00000014 0000000000000000
30 .zdebug_macinfo 00000014 0000000000000000
31 .zdebug_line 0000005b 0000000000000000
32 .zdebug_loc 00000014 0000000000000000
33 .zdebug_pubtypes 00000014 0000000000000000
34 .zdebug_str 00000073 0000000000000000
35 .zdebug_ranges 00000014 0000000000000000

Decompression and proper handling of debug info sections may be needed
in llvm-dwarfdump and llvm-symbolizer tools. We can implement this by:
1) Checking if zlib is present in the system during configuration.
2) Adding zlib decompression to llvm::MemoryBuffer, and section
decompression to LLVMObject (this would require optional linking with -lz).
3) Using the methods in LLVM tools where needed.

Does this make sense to you?

--
Alexey Samsonov, MSK

I'm not sure MemoryBuffer is the right place to do this either. I'm also
not sure if we want debug info decompression to be transparent in
LLVMObject or not. I'm leaning towards no since it's not part of the
standard yet,

Yeah, I also think that decompression should be explicitly requested by the
user of LLVMObject.

Definitely want the feature :slight_smile:

I don't see SHF_COMPRESSED (unless readelf just isn't showing it to
me), but it wouldn't be too hard to get binutils to mark them as such.
Right now the convention is .z<foo> are compressed, but that's not as
precise as we'd like it to be. There's been some talk on the binutils
list about it, but it hasn't been implemented yet.

-eric

Just in case - do we want to link with libz.so installed in the system, or be self-contained and copy sources to LLVM repo?

Historically we've done the former. The latter would require Chris
wanting to do that.

-eric

This case isn't so clearcut. We like to include libraries in the source to make it easy to get up and running without having to install a ton of dependencies. However, this has license implications and is generally annoying.

Given that zlib is so widely available by default, and that the compiler can generate correct (albeit uncompressed) debug info, I think the best thing is to *not* include a copy in llvm. Just detect and use it if we can find it, but otherwise generate uncompressed output.

-Chris

Sounds good to me. Thanks Chris!

-eric

> Historically we've done the former. The latter would require Chris
> wanting to do that.

This case isn't so clearcut. We like to include libraries in the source
to make it easy to get up and running without having to install a ton of
dependencies. However, this has license implications and is generally
annoying.

Looks like zlib license <http://zlib.net/zlib_license.html&gt; is good enough
to avoid implications, but I can't really judge.

Given that zlib is so widely available by default, and that the compiler
can generate correct (albeit uncompressed) debug info, I think the best
thing is to *not* include a copy in llvm. Just detect and use it if we can
find it, but otherwise generate uncompressed output.

Sure, I'll go this way then. Thanks!

past and are not unlikely to have new issues in the future, it is highly
annoying. As such, I would strongly prefer to keep it optional.

Joerg

This might be a bit late, but I've got another argument for bundling
zlib source with LLVM.

Sanitizer tools need to symbolize stack traces in the reports. We've
been using standalone symbolizer binary until now; sanitizer runtime
spawns a new process as soon as an error is found, and communicates
with it over a pipe. This is very cumbersome to deploy, because we
need to keep another binary around, specify a path to it at runtime,
etc. LLVM lit.cfg already carries some of this burden.

A much better solution would be to statically link symbolization code
into the user application, the same as sanitizer runtime library.
Unfortunately, symbolizer depends on several LLVM libraries, C++
runtime, zlib, etc. Statically linking all that stuff with user code
results in symbol name conflicts.

We've come up with what seems to be a perfect solution (thanks to a
Chandler's advice at the recent developer meeting). We build
everything down to (but not including) libc into LLVM bitcode. This
includes LLVMSupport, LLVMObject, LLVMDebugInfo, libc++, libc++abi,
zlib (!). Then we bundle it all together and internalize all
non-interface symbols: llvm-link && opt -internalize. Then compile
down to a single object file.

This results in a perfect isolation of symbolizer internals. One
drawback is that this requires source for all the things that I
mentioned - and at the moment we've got everything but zlib.

We'd like this to be a part of the normal LLVM build, but that
requires zlib source available somewhere. We could add a
cmake/configure option to point to an externally available source, but
that sounds like a complication we would like to avoid.

WDYT?

Lemme chat with Danny off list about the best way to do this, and I’ll post an update.

It is possible to do both. Include an internal one and also link to and external one, and make it a compile time option which to use.

You shouldn't need to use bitcode and opt -internalize to hide the
symbols. You can do it with objcopy --localize-hidden like we did for
DynamoRIO, but I assume you prefer this route because it ports nicely
to Mac. :slight_smile:

Portability is always good.

But objdump method does not seem to work well when there is code we
don't fully control. Hidden visibility is overridable, and there is
enough cases of that in libcxx and libcxxabi to cause problems. Entire
exception interface, for example.