Below is the list of BOLT project ideas with a brief description of each. Once a project is picked for active development, expect to start a new topic and file an RFC when suitable. Comment on this thread to add new ideas to the list.
-
CFG Disassembler.
BOLT symbolizes disassembly output and reconstructs control flow for detected functions, including that for indirect branches corresponding to jump tables. Such functionality by itself is useful for analyzing binary code. BOLT outputs the control flow graph under the “-print-cfg” option, but a dedicated command-line tool will provide a better user experience. Alternatively, we can integrate the CFG output into llvm-objdump. -
CFG Visualization.
Expand on the CFG disassembler. Use GUI to display the graph. -
Memory Instrumentation for Sanitizers.
Sanitizers (asan, memsan, etc.) primarily rely on the compiler for instrumentation, limiting their visibility into assembly and pre-compiled third-party code. Loads and stores missed during instrumentation can lead to false positives and false negatives in the tool output. BOLT can add missing instrumentation and provide a better experience running sanitizers. -
MCPlus Serialization.
MCPlus, the internal representation used by BOLT, is built on top of the MC/MCInst layer. Adding text serialization form for MCPlus can provide several benefits. First, the compiler can emit MCPlus directly, eliminating the need to disassemble and reconstruct CFG in BOLT. Second, BOLT can save and re-load IR, opening an opportunity to edit pre-compiled binaries using assembly-like language. -
Static Data Layout Optimization.
Similar to how BOLT modifies code layout based on profile data, it can optimize the layout of static data. Read-only data will be easier to reorder without requiring extra information from the linker if the original data is preserved. With enough info from the compiler/linker, the reordering can be extended to all static data. -
Raising IR.
Raising IR to MachineInstruction or even LLVM IR level can provide further opportunities for application optimization. It’s a nontrivial task and may not always be done in a performance-efficient way. However, having a subset of the functions raised to a higher-level IR can still benefit performance. Look into related projects such as McSema and llvm-mctoll. -
Profile-driven Register Reallocation.
This one is related to the raising IR project but can also be approached independently. Higher quality profile available to BOLT may open opportunities for better register allocation. -
Optimizing Linux Kernel.
This is a work in progress: LPC 2021 - Toolchains and Kernel MC - YouTube -
Code Prefetching.
Software code prefetching is described in Chapter 5 of “AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers” (AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers – Google Research). It relies on the presence of “code prefetch” instruction. X86 lacks such an instruction, but it’s possible to experiment with L2 data prefetch. -
Reduce Binary Overhead.
BOLT creates a new segment where it places optimized code (unless run with “-use-old-text” option), which results in a binary size increase. While this size bloat does not cause any performance regressions, it may become an undesirable effect of binary rewriting. BOLT can “compress” unoptimized code by removing gaps created by moving away optimized functions and expand the existing code segment. -
Support Stripped Binaries with Split Functions.
BOLT will need to correctly process stripped binaries with split functions to optimize pre-built binaries from a typical Linux distro.