I was interested in trying out LLVM BOLT and generated profile data using Linux perf using the following:
perf record -e cycles:u -o perf.data
This is without the use of LBR so I understand the performance improvements may not be much but this was more for becoming familiar with BOLT’s commands.
I then run:
perf2bolt -nl -p perf.data -o perf.fdata
and I get the following:
PERF2BOLT: Starting data aggregation job for perf.data
PERF2BOLT: spawning perf job to read events without LBR
PERF2BOLT: spawning perf job to read mem events
PERF2BOLT: spawning perf job to read process events
PERF2BOLT: spawning perf job to read task events
BOLT-INFO: Target architecture: x86_64
BOLT-ERROR: input file was processed by BOLT. Cannot re-optimize.
Not sure why I get the above error. Can someone who has used BOLT help me?
perf2bolt has strict demands on its inputs when generating the profile data file. The input binary to perf2bolt must be the same that was running when you launched perf record, and it will try to verify that by checking the build id, if the binary has one. It also assumes this will be the binary you will later optimize. Ordinarily, BOLT doesn’t optimize a binary that was already optimized by BOLT itself. That’s the message you are getting. You should try collecting data on a binary that you did not already optimize with BOLT.
That said, if you really want it, it is possible to collect data in binaries that were already bolted, but you need some non-standard flags for that. You need to use -enable-bat when generating that binary. This flag will embed a translation table in your binary that perf2bolt uses to build the profile data suitable to be consumed in the original binary. This is non-standard because it is only really necessary in some large scale deployments where collecting the data in a special “no bolt” configuration can be inconvenient.
Running a project with source code under a debugger is the fastest way I know to create a mental model of the most important classes and how a project is organized. I definitely recommend doing that. BOLT’s documentation would be a high-level view described in the 2019 CGO paper and some slides, but the technical details are only available in source code.
However, I can give a quick overview to get you better prepared. Here are some key points:
- Binary format is mostly abstracted away by LLVM’s libObject (see Binary.h and ObjectFile.h). We are currently working in a new bolt-only BinaryFormat abstraction to better encapsulate the gory details of manipulating object files.
- llvm-bolt.cpp is the main tool entry point. All class hierarchy where real work happens is designed as a library, LLVM-style, and llvm-bolt.cpp is the main user of BOLT as a library.
- The main control class in BOLT would be RewriteInstance. This represents the concept of a single binary rewrite that was requested by the user.
- RewriteInstance will coordinate the entire rewrite process by first reading the binary, building BOLT’s IR to represent its contents, perform a pipeline of modifications (BinaryPasses) and then rewriting it in a separate output file.
If your interest in BOLT is to write a pass, I would suggest setting breakpoints and paying special attention to BinaryPassManager. You can find the simplest passes and easier to understand under Passes/BinaryPasses.cpp. Just copy one of them and register your copy in BinaryPassManager. Your pass will be exposed to BOLT’s view of the world and you can write code to dump a snapshot of this view for a quick analysis (look for dump() methods in the objects exposed to your pass).
BOLT’s IR top-level class is BinaryFunction. A BinaryFunction holds BinaryBasicBlock instances, which holds MCInst instances. Since BOLT operates with an augmented MCInst in comparison with the regular MCInst from LLVM, we have a special class to deal with operations on instructions and abstract the target machine. You can play around by using MCPlusBuilder and its subclass X86MCPlusBuilder to do all sort of work, such as creating/checking calls, branches, etc. If you want to instrument your binary, for example, take a look at Passes/Instrumentation.cpp to see how it accomplishes branch instrumentation.
I hope this helps,
Thanks a lot for the detailed description. I think this would help me a lot.