Hi all,
Thanks to everyone who participated in the code size birds-of-a-feather today! Here are the minutes: https://docs.google.com/document/d/1kDsbFDtkWLceR-Y63ez04CBfvhbDN8RxWV3xXwaNUC0/edit?usp=sharing
The minutes are also copied below for convenience.
LLVM 2020
Code Size BoF Minutes
See also: Aditya Kumar’s talk on “Code Size Compiler Optimizations and Techniques”
Minutes- What use cases for code size are important to the LLVM community?
-
Deeply embedded devices: IoT, RISC-V
-
Memory is a large part of the cost of the chip
-
Gaming consoles
-
WebAssembly, for faster page loads
-
Mobile apps
-
Cache-limited programs on desktops, supercomputers
-
Real-time systems (both size & speed are important)
-
Find the code that isn’t on the critical path and optimize it for size
-
Or find thresholds for heuristics that are good enough for both
-
Hyperscale systems with lots of processors but only a little memory for each one
-
Epiphany/Adapteva architecture
-
Why doesn’t -Oz do much better than -O2?
-
There are optimizations that make dramatic speed improvements, but not as many that make dramatic size improvements
-
It doesn’t disable things like loop unrolling
-
It doesn’t disable inlining
-
It doesn’t enable loop rerolling
-
-Oz affects heuristics for e.g. inlining, but they can still make the wrong decision
-
TargetTransformInfo has options for code size, but they often aren’t implemented
-
If -Oz uses a poorly tuned pass sequence, we can autotune it with OpenTuner etc.
-
Can get ~5% extra size reduction this way.
-
LTO is effective for code size; was recently fixed to support -Oz
-
Do we need different levels like -Os1, -Os2, -Os3?
-
We have profilers for code speed; can we have something similar for code size?
-
There is something like this for WebAssembly: Twiggy🌱
-
Remarks help understand what optimizations are being missed
-
What code size benchmarks can we use to measure patches?
-
Technique: outlining
-
How is outlining affected by the size of the instruction set?
-
Machine outliner might be affected, but IR-level outlining shouldn’t be affected much
-
How do machine outliner and IR-level outliner compare?
-
Technique: function merging
-
Research paper: “Function Merging by Sequence Alignment”
-
Look into string alignment techniques from biology
-
Technique: deleting unused code
-
What’s the best resource?
-
Technique: compressors like gzip
-
UPX, LZEXE save disk space
-
To save RAM, you need a compression algorithm that supports random access, like something based on Huffman codes
-
Has been implemented in hardware (IBM CodePack)
-
Technique: using different optimizations for hot & cold code
-
For best results, people currently have to move their cold code to a separate file so they can use different flags, which is painful
-
Could we do code size optimizations in the linker?
-
Needed to take advantage of RISC-V’s LUI instruction
-
Requires copy propagation and dead code elimination
-
Instead of inlining functions into each caller, can we keep one copy of the function and specialize it for all its call sites in the same translation unit?
-
Attributor already shows improvements when you have recursion and non-trivial but dead code (the level stuff in Olden/bisort/bitonic)
Topics1. Situations where code size matters
-
What do other compilers do better?
-
Techniques for code size optimization
-
How can we improve LLVM?
Techniques- Program design, library design, etc.
-
General optimizations
-
Loop idiom recognition (memset, memcpy)
-
Partial inlining
-
Deduplication
-
Outlining
-
Function merging
-
Functionality changes
-
Partial evaluation
-
Delete unused code
Thanks,
Sean Bartell