Code size BoF Minutes

Hi all,

Thanks to everyone who participated in the code size birds-of-a-feather today! Here are the minutes: https://docs.google.com/document/d/1kDsbFDtkWLceR-Y63ez04CBfvhbDN8RxWV3xXwaNUC0/edit?usp=sharing

The minutes are also copied below for convenience.

LLVM 2020
Code Size BoF Minutes

See also: Aditya Kumar’s talk on “Code Size Compiler Optimizations and Techniques”

Minutes- What use cases for code size are important to the LLVM community?

  • Deeply embedded devices: IoT, RISC-V

  • Memory is a large part of the cost of the chip

  • Gaming consoles

  • WebAssembly, for faster page loads

  • Mobile apps

  • Cache-limited programs on desktops, supercomputers

  • Real-time systems (both size & speed are important)

  • Find the code that isn’t on the critical path and optimize it for size

  • Or find thresholds for heuristics that are good enough for both

  • Hyperscale systems with lots of processors but only a little memory for each one

  • EMU architecture

  • Epiphany/Adapteva architecture

  • Why doesn’t -Oz do much better than -O2?

  • There are optimizations that make dramatic speed improvements, but not as many that make dramatic size improvements

  • It doesn’t disable things like loop unrolling

  • It doesn’t disable inlining

  • It doesn’t enable loop rerolling

  • -Oz affects heuristics for e.g. inlining, but they can still make the wrong decision

  • TargetTransformInfo has options for code size, but they often aren’t implemented

  • If -Oz uses a poorly tuned pass sequence, we can autotune it with OpenTuner etc.

  • Can get ~5% extra size reduction this way.

  • LTO is effective for code size; was recently fixed to support -Oz

  • Do we need different levels like -Os1, -Os2, -Os3?

  • We have profilers for code speed; can we have something similar for code size?

  • There is something like this for WebAssembly: Twiggy🌱

  • Remarks help understand what optimizations are being missed

  • What code size benchmarks can we use to measure patches?

  • Embench.org

  • CSiBE

  • Technique: outlining

  • How is outlining affected by the size of the instruction set?

  • Machine outliner might be affected, but IR-level outlining shouldn’t be affected much

  • How do machine outliner and IR-level outliner compare?

  • Technique: function merging

  • Research paper: “Function Merging by Sequence Alignment”

  • Look into string alignment techniques from biology

  • Technique: deleting unused code

  • What’s the best resource?

  • Technique: compressors like gzip

  • UPX, LZEXE save disk space

  • To save RAM, you need a compression algorithm that supports random access, like something based on Huffman codes

  • Has been implemented in hardware (IBM CodePack)

  • Technique: using different optimizations for hot & cold code

  • For best results, people currently have to move their cold code to a separate file so they can use different flags, which is painful

  • Could we do code size optimizations in the linker?

  • Needed to take advantage of RISC-V’s LUI instruction

  • Requires copy propagation and dead code elimination

  • Instead of inlining functions into each caller, can we keep one copy of the function and specialize it for all its call sites in the same translation unit?

  • Attributor already shows improvements when you have recursion and non-trivial but dead code (the level stuff in Olden/bisort/bitonic)

Topics1. Situations where code size matters

  1. What do other compilers do better?

  2. Techniques for code size optimization

  3. How can we improve LLVM?

Techniques- Program design, library design, etc.

  • General optimizations

  • Loop idiom recognition (memset, memcpy)

  • Partial inlining

  • Deduplication

  • Outlining

  • Function merging

  • Functionality changes

  • Partial evaluation

  • Delete unused code

Thanks,
Sean Bartell