BOLT: Can bolt process PE files?

LLVM BOLT seems only support 64bit ELF file, will bolt support process and optimize PE files in windows?
Thanks for your reply!

Not at the moment. There has been some talk about adding it. But afaik no one is currently working on it.

1 Like

I am a beginner in BOLT, maybe some statements are not quite correct, if there are mistakes please correct me.
As an optimizer, BOLT receives X86 code segment as input from some components of LLVM which parse executable files or dynamic link libraries to get X86 code segment. Now LLVM seems to have provided ELF and COFF(PE) format file parsing, since BOLT is decoupled from the file parsing stage, then BOLT should be able to handle any X86 code segment no matter which kind of file it come from.

As @tobiashieta mentioned, BOLT team has no plan to add it.
@TaoTao-real, BOLT reads and rewrites ELF files directly using LLVM libraries. BOLT also has initial support for MachO format, so if you’re interested in adding PE support, you may take a look into that.

Perhaps a bigger problem is collecting the profile to drive optimizations – either BOLT instrumentation for PE would need to be implemented, or some sort of sampling using LBR (~Linux perf). However, I’m not aware of perf-equivalent facilities on Windows (except VTune but I’m not sure if it can export the traces).

2 Likes

The Windows Platforms SDK has several command-line tools that aren’t that well known. One of them is xperf, made popular by Bruce Dawson’s UIforETW. xperf is able to sample LBR data, see here: ETWAnalyzer/ETWAnalyzer/Documentation/DumpLBRCommand.md at main · Siemens-Healthineers/ETWAnalyzer · GitHub

In essence what you’re saying is that for Windows we could plug COFF support into BOLT and teach it to read the .etl files that xperf emits, and that would be enough? Is there an integration guide?

Please forgive my ignorance about PE/COFF format, but to add BOLT support the following pieces would be needed prior to changes to BOLT itself:

  1. The executable file format ideally needs to be an open standard to avoid playing catch with compatibility like in the case of other Microsoft’s proprietary formats.
  2. (I assume Windows has a linker program and something equivalent to relocations in its object file format) – the linker needs to support preserving static relocations in the output executable (--emit-relocs option in ld/gold/lld). This is a requirement for function reordering in BOLT since it needs to i) identify all function references (from both data and code), ii) move functions in the text section.
  3. The executable file needs to have something equivalent to sections/segments and program/section headers. There should be a way to either modify program headers in place or create a new table in a new location – to be able to allocate new segments/sections containing hot/cold code.
  4. Linux AMD64 ABI has a requirement that functions must have FDEs in eh_frame for unwinding purposes, and BOLT uses FDEs to determine function boundaries (cross-checked with symbol table). There must be an equivalent reliable mechanism to identify function boundaries.
  5. xperf profile would need to be either converted to one of the existing profile formats (pre-aggregated pure-offset profile format or symbol+offset-based fdata format), or a custom profile reader added for its output.
  6. Updating debug information: the format must be i) specified, ii) flexible enough to permit BOLT’s optimizations (function splitting and reordering).

There’s no integration guide, but as I mentioned above, MachORewriteInstance would be the best starting point for looking into supporting a new object file format.

3 Likes

Hello Amir,

Thanks for the detailed list. I think for most of it, llvm-project/llvm and lld/ have all the utility code required. As for your points:

  1. The Windows PE is well documented by Microsoft and well understood. We stumble sometime on some less common, or newer, section names, but that shouldn’t be a problem for BOLT. We emit PEs in LLD, and we read them (succinctly) in llvm-readobj.
  2. Most of Windows PE images are relocatable and both MSVC and LLD will always generate the .reloc sections if there are symbols to relocate. The only option afaik that implicitly disables the .reloc section is /dynamicbase:no which makes a fixed address PE, but that is not very common nowadays.
  3. OBJ files can have multiple .textXXX sections but in the PE they are all merged together in a single .text section. I’m not sure whether the Windows loader supports multiple code sections. Ideally BOLT would need to fully re-write the resulting PE and re-merge the .text sections that have changed.
  4. On Windows/COFF there is a .pdata section, which is the exception table, and it has "FDE"s for each function.
  5. Unfortunately the ETL format that xperf generates isn’t documented. But there are tools to convert it in more simple formats, and there seems to be an effort to document it here and here. That’s probably the single thing that is really missing from LLVM right now.
  6. Codeview debug format is documented and used across LLVM. Indeed we discover sometimes new things that Microsoft adds but that shouldn’t prevent from porting BOLT to COFF.

A further question I might have: it feels BOLT should be part of the linker, like LTO and PGO are. Would it make sense to rather integrate it into LLD? It would have access to more information, to all the sections and debug info, before the PE and the PDB are laid out. I feel it would be easier than reconstructing things after the fact. I understand that would have been more difficult to create it & integrate into ld or gold but here we are, BOLT is part of LLVM now.

2 Likes

Integration with lld, preferably via a plugin mechanism, will give BOLT more flexibility when generating the binary. E.g., we wouldn’t have to create a new segment and can avoid duplication of parts of .text and accompanying metadata resulting in a smaller binary size. However, integration with any linker is far from being a panacea for all problems raised while rewriting the code, especially when you strive for the best layout where functions (and thus FDEs) are becoming fragmented. E.g., to rewrite exception handling tables (which are separate from unwind tables) it’s not enough to process relocations. We need intrinsic knowledge of the throws/catch mechanism to update tables correctly. Linkers don’t have to deal with such issues.

1 Like