TL;DR WDYT of adding zlib decompression capabilities to LLVMObject library?
ld.gold from GNU binutils has --compress-debug-sections=zlib option,
which uses zlib to compress .debug_xxx sections and renames them to .zdebug_xxx.
binutils (and GDB) support this properly, while LLVM command line tools don’t:
$ ld --version
GNU gold (GNU Binutils for Ubuntu 2.22) 1.11
TL;DR WDYT of adding zlib decompression capabilities to LLVMObject library?
Yes, I want this.
ld.gold from GNU binutils has --compress-debug-sections=zlib option,
which uses zlib to compress .debug_xxx sections and renames them to
.zdebug_xxx.
binutils (and GDB) support this properly, while LLVM command line tools
don't:
Decompression and proper handling of debug info sections may be needed
in llvm-dwarfdump and llvm-symbolizer tools. We can implement this by:
1) Checking if zlib is present in the system during configuration.
2) Adding zlib decompression to llvm::MemoryBuffer, and section
decompression to LLVMObject (this would require optional linking with -lz).
3) Using the methods in LLVM tools where needed.
Does this make sense to you?
Yes, exactly. I'm not certain that MemoryBuffer and LLVMObject are the right places, but it doesn't sound wrong.
I'm not sure MemoryBuffer is the right place to do this either. I'm also
not sure if we want debug info decompression to be transparent in
LLVMObject or not. I'm leaning towards no since it's not part of the
standard yet, unless gold is actually using the SHF_COMPRESSED flag.
I think it should be part of Object, but as an external API that is used
when you find a section you know from external factors (the name matches
some list) is compressed.
TL;DR WDYT of adding zlib decompression capabilities to LLVMObject
library?
ld.gold from GNU binutils has --compress-debug-sections=zlib option,
which uses zlib to compress .debug_xxx sections and renames them to
.zdebug_xxx.
binutils (and GDB) support this properly, while LLVM command line tools
don't:
Decompression and proper handling of debug info sections may be needed
in llvm-dwarfdump and llvm-symbolizer tools. We can implement this by:
1) Checking if zlib is present in the system during configuration.
2) Adding zlib decompression to llvm::MemoryBuffer, and section
decompression to LLVMObject (this would require optional linking with -lz).
3) Using the methods in LLVM tools where needed.
Does this make sense to you?
--
Alexey Samsonov, MSK
I'm not sure MemoryBuffer is the right place to do this either. I'm also
not sure if we want debug info decompression to be transparent in
LLVMObject or not. I'm leaning towards no since it's not part of the
standard yet,
Yeah, I also think that decompression should be explicitly requested by the
user of LLVMObject.
I don't see SHF_COMPRESSED (unless readelf just isn't showing it to
me), but it wouldn't be too hard to get binutils to mark them as such.
Right now the convention is .z<foo> are compressed, but that's not as
precise as we'd like it to be. There's been some talk on the binutils
list about it, but it hasn't been implemented yet.
This case isn't so clearcut. We like to include libraries in the source to make it easy to get up and running without having to install a ton of dependencies. However, this has license implications and is generally annoying.
Given that zlib is so widely available by default, and that the compiler can generate correct (albeit uncompressed) debug info, I think the best thing is to *not* include a copy in llvm. Just detect and use it if we can find it, but otherwise generate uncompressed output.
> Historically we've done the former. The latter would require Chris
> wanting to do that.
This case isn't so clearcut. We like to include libraries in the source
to make it easy to get up and running without having to install a ton of
dependencies. However, this has license implications and is generally
annoying.
Given that zlib is so widely available by default, and that the compiler
can generate correct (albeit uncompressed) debug info, I think the best
thing is to *not* include a copy in llvm. Just detect and use it if we can
find it, but otherwise generate uncompressed output.
This might be a bit late, but I've got another argument for bundling
zlib source with LLVM.
Sanitizer tools need to symbolize stack traces in the reports. We've
been using standalone symbolizer binary until now; sanitizer runtime
spawns a new process as soon as an error is found, and communicates
with it over a pipe. This is very cumbersome to deploy, because we
need to keep another binary around, specify a path to it at runtime,
etc. LLVM lit.cfg already carries some of this burden.
A much better solution would be to statically link symbolization code
into the user application, the same as sanitizer runtime library.
Unfortunately, symbolizer depends on several LLVM libraries, C++
runtime, zlib, etc. Statically linking all that stuff with user code
results in symbol name conflicts.
We've come up with what seems to be a perfect solution (thanks to a
Chandler's advice at the recent developer meeting). We build
everything down to (but not including) libc into LLVM bitcode. This
includes LLVMSupport, LLVMObject, LLVMDebugInfo, libc++, libc++abi,
zlib (!). Then we bundle it all together and internalize all
non-interface symbols: llvm-link && opt -internalize. Then compile
down to a single object file.
This results in a perfect isolation of symbolizer internals. One
drawback is that this requires source for all the things that I
mentioned - and at the moment we've got everything but zlib.
We'd like this to be a part of the normal LLVM build, but that
requires zlib source available somewhere. We could add a
cmake/configure option to point to an externally available source, but
that sounds like a complication we would like to avoid.
You shouldn't need to use bitcode and opt -internalize to hide the
symbols. You can do it with objcopy --localize-hidden like we did for
DynamoRIO, but I assume you prefer this route because it ports nicely
to Mac.
But objdump method does not seem to work well when there is code we
don't fully control. Hidden visibility is overridable, and there is
enough cases of that in libcxx and libcxxabi to cause problems. Entire
exception interface, for example.