[RFC] Zstandard as a second compression method to LLVM

Hello all,

The LLVM project currently has support for zlib as a compression algorithm. Usage of it varies from compression of ELF debug sections, to serialization of performance stats and AST data structures.

We would like to add Zstandard (A.K.A. Zstd) as an alternative to zlib, which tends to achieve higher compression rates while being faster across the board. Using those for internal tooling could lead to speed improvements in places where we compress AST’s etc, without sacrificing the compressed size of them.

In terms of implementation, here are some initial ideas:

  • Relocating the llvm::zlib namespace as declared in lib/Support/Compression.cpp, to a namespace like llvm::compression::zlib.

  • Add a llvm::compression::zstd namespace.

  • define a namespace alias llvm::compression::tooling, that either aliases to the Zstd or zlib namespaces, based on llvm cmake flags.

This allows us to easily spot code that needs to be updated, and simple to keep tools consistently using the same compression internally by using llvm::compression::tooling instead of llvm::compression::zlib or llvm::compression::zstd when possible.

I have been able to create a working POC with this implementation pattern, and it seems to work, but would not be surprised if there are more appealing approaches.

Some important questions:

  • Are there other implementation details/external projects that we would also need to account for?

– Cole Kissane

5 Likes

I know @resistor looked into zstd measurements for DWARF compression, so tagging him here in case he’s got ideas.

At least for DWARF compression, I believe this wouldn’t be a compile-time swappable implementation, as the compression scheme is user-selectable and encoded in the ELF file format (so clients know how to decompress it).

It’s probably worth doing something similar for other compression use cases in LLVM (except in cases that are locked to the exact version of LLVM, like Clang serialized modules) - so that compatible compilers can still be used together without problems (or that incompatibilities can be detected - eg: if clang encodes with zstd but you link with an lld that hasn’t been built with zstd support)

2 Likes

I reported an issue with poor compression of DWARF to zstd last year: https://github.com/facebook/zstd/issues/2832

I’m generally supportive of zstd as an alternative compression scheme to skin for most things in LLVM.

@resistor I’m glad your generally supportive of zstd as an alternative compression scheme to skin for most things in LLVM.
I would also like to note that that the github link in your reply sparked my curiosity and I have locally experimented with adding zstd as a dwarf debug compression method and I found the following on the named section of clang++
(zstd is at level 7 and zlib is at 6 (the default for llvm-project):

File: ./bin/clang++.debug_str_offsets
  Size: 378000136
File: ./bin/clang++.debug_str_offsets-zlib
  Size: 336024112
File: ./bin/clang++.debug_str_offsets-zstd
  Size: 313555712 (93% size of zlib)

I got these files using llvm-objcopy

$ time ./bin/llvm-objcopy --compress-debug-sections=zstd ./bin/clang++.debug_str_offsets ./bin/clang++.debug_str_offsets-zstd
./bin/llvm-objcopy --compress-debug-sections=zstd    1.15s user 0.46s system 99% cpu 1.608 total
$ time ./bin/llvm-objcopy --compress-debug-sections=zlib ./bin/clang++.debug_str_offsets ./bin/clang++.debug_str_offsets-zlib
./bin/llvm-objcopy --compress-debug-sections=zlib    4.49s user 0.64s system 99% cpu 5.140 total

Note that though compression through zstd level 7 was only 7% smaller than zlib, it was 3.2x faster than zlib, so I’d venture to say that ZSTD could actually be pretty applicable to DWARF debug info!

Thanks for proposing this. I have wanted Zstandard for a long time.

These changes look good to me.

For ELF SHF_COMPRESSED flag, we need a new value, preferably a generic-abi value, otherwise a GNU ABI value (possibly reused by Fuchsia/FreeBSD/NetBSD/OpenBSD/etc). I have proposed Add new ch_type value: ELFCOMPRESS_ZSTD on the generic-abi forum and created New ch_type value ELFCOMPRESS_ZSTD? to notify binutils/gdb/gcc/gnu-gabi. I am happy to Cc interested folks (plz give me your email address :slight_smile: )

We will need features from a bunch of tools:

  • GCC/Clang: perhaps we can introduce a new driver option -gz=zstd
  • GNU assembler: add --compress-debug-sections=zstd
  • GNU ld: add --compress-debug-sections=zstd
  • objcopy/llvm-objcopy: add --compress-debug-sections=zstd
  • gdb and many debug info consumers: support Zstandard
  • elfutils: support Zstandard

Note: after clang -c -gz=zstd is supported and ld.lld recognizes Zstandard SHF_COMPRESSED sections, we can benefit from Zstandard compressed relocatable object files.

1 Like

@MaskRay in terms of ELF section implementation I have more code in Phabricator ( ⚙ D128667 [WIP] Add Zstd ELF support )that takes care of the objcopy/llvm-objcopy and ldd doing --compress-debug-sections=zstd support in ldd and support in dwarf obj dump etc. and assumes ELFCOMPRESS_ZSTD 2, this Phabricator Diff relies of course on ⚙ D128465 [llvm] add zstd to `llvm::compression` namespace being reviewed and eventually accepted first, @MaskRay would you be interested in reviewing this and the second diff that implements ELFCOMPRESS_ZSTD 2?

I would still like to work with upstream zstandard to the extent possible to get good DWARF compression either at the default or maximum compression levels. I don’t want us to end up having to cherry-pick the compression level just for DWARF.

+1 to this - might be worth figuring that out before picking it up? (not sure how Zstandard versioning/improvements work, would these fixes be backwards compatible? (eg: if tools added Zstandard decompression support are they at risk of not being able to decompress data compressed with some future improved Zstandard compressed output?))

Also, and I know this is really vague: at least for DWARF, especially Split DWARF, a compression scheme that can especially efficiently concatenated (eg: if I have two chunks of compressed data, an algorithm that can produced a compressed result containing both data - would make linker/dwp-based compressed-to-compressed streaming better, rather than having to buffer/decompress the stream only to recompress it).

So if someone’s interested in a better compression scheme for DWARF in ELF, that ^ might be a property that’s worth some novel research.

To my knowledge the zstd bitstream format is stable. Fixing these issues is about making the compressor smarter, and no changes should be required on the decompressor side.

Thanks for the second patch. I made some initial comments.

The generic-abi discussion (https://groups.google.com/g/generic-abi/c/satyPkuMisk) is ongoing. We still need some agreement before we can settle down on the value 2. In recent years, in practice it seems that if LLVM, GNU, and Oracle Solaris can agree on something, we may be able to proceed. (@DimitryAndric @BradSmith) I cced binutils’ head maintainer Nick Clifton there as most GNU side changes are in binutils.

I investigated this a bit for ⚙ D117853 [ELF] Parallelize --compress-debug-sections=zlib .
zlib has an example doing this (https://github.com/madler/zlib/blob/master/examples/gzjoin.c#L34) but its implementation is involved. It’s nice if zstd supports this in a simpler way but I’ll not stress it if it doesn’t. This size increase is unclear yet.

Not sure I follow about the last bit (the size increase of what?)

The ability to concatenate already compressed sections would be really valuable in terms of, especially, dwp memory usage - where many of the sections require no changes and only need to be concatenated together. If we could do that with compressed input, compressed output, and not a lot of work (& especilaly not a lot of memory usage) in the process, it’d really lower the cost of building with debug info (& storing it).

zstd allows multiple frames (top-level container for compressed data) to be concatenated in a single file. From the spec:

Zstandard compressed data is made of one or more frames . Each frame is independent and can be decompressed independently of other frames. The decompressed content of multiple concatenated frames is the concatenation of each frame decompressed content.

Does this work for your use-case?

It’s part of it - that would provide a /very/ cheap but not very well compressed result (so if you want to optimize heavily for dwp/link time, at the expense of output size). But what I was really hoping for was something more content-aware - something that’s more time-efficient that fully decompressing/recompressing everything, something that can take advantage of the work done to compress the input when determining how to compress the output/concatenation, rather than throwing all that work away when compressing from several compressed inputs.

1 Like

Thank you for clarifying. To my knowledge zstd doesn’t support this.

ELFCOMPRESS_ZSTD == 2 has been accepted in the generic ABI
https://groups.google.com/g/generic-abi/c/satyPkuMisk/m/KwTF_U8rBAAJ

FreeBSD will define this in sys/elf_common.h and I will define it in glibc (hopefully catch up 2.36).

3 Likes