This is a proposal for improving LLVM's support for profile information. Many optimizations can benefit from using that information, especially things like inlining and code placement, but LLVM's current profiling support is not very usable. We should fix that. I've been collecting ideas from other LLVM developers over the last few weeks, and I've put them together here to solicit feedback. Most of the good ideas here are thanks to Andy Trick.
There are at least 3 different sources of profile information:
* __builtin_expect: The programmer provides branch prediction hints using this builtin function.
* Static estimates: The compiler analyzes the code and uses simple heuristics or more sophisticated analyses to estimate the profile information.
* Profile feedback: Profile information is collected during one or more executions of the program and then fed back to the compiler.
Regardless of the source, these are all providing essentially the same information, perhaps with different levels of detail or reliability.
LLVM currently has minimal support for collecting several different kinds of profile feedback (basic block counts, edge counts, etc.) but none of the standard passes in LLVM make any use of it. Some passes also use heuristics to guess branch probabilities (e.g., if-conversion). There is no support at all for __builtin_expect.
* LLVM should provide a single API to access and update profile information, regardless of its source. Different passes may want to ask different questions (e.g., "what is the probability this branch will be taken?", "how many times is this block expected to execute?", etc.), and we can support whatever makes sense. I'm not proposing anything specific yet -- the important point is that passes should be able to use one API for this.
* An LLVM analysis pass can be created to use static analysis and various heuristics to estimate the profile information. The API for profiling info would be provided by this pass.
* In the usual LLVM way, passes that modify the IR could either update the profile analysis info directly or indicate that they do not preserve it so that subsequent uses would rerun the analysis.
* __builtin_expect info can be attached to the LLVM IR as metadata. Specifically, conditional branches can have metadata that records the branch probabilities (e.g., taken vs. not-taken) as fixed-point fractions. When the profile analysis pass sees the metadata, it will use it in place of any other heuristics or estimates for that branch.
* Profile feedback can use the same metadata as __builtin_expect to record the branch probabilities for conditional branches. In the same way, multi-way branches can have metadata with the probability for each outgoing edge. If all of the branch probabilities are known, they are sufficient to calculate the block execution counts and edge counts per invocation of a function. If each function also has metadata recording an invocation count, the profile analysis pass will be able to calculate the absolute number of block and edge execution counts.
Recording branch probabilities (as opposed to edge counts) in the metadata makes it easier to keep the information accurate as various transformations are applied. In many cases, e.g., when duplicating a block, the branch probability can simply be copied along with the rest of the code, and there are no problems of adjusting all the counts to keep them consistent with each other.
Using metadata is a natural fit for this. Among other things, it allows the profile feedback support to be independent from the rest of the pieces. The raw profile information can be read in, correlated with the IR, and used to produce the metadata. From that point on, the rest of the system doesn't need to know about it.
The branch probability metadata can be specified for only a subset of the branches, with the rest of the control flow being estimated by the profile analysis pass. This is typically the case when __builtin_expect is used, but it will also enable profile feedback to fail gracefully when the program has been modified since the profile was collected. It might be a while before we get to the point of caring about that, but it's a good thing to have a system that supports it.
Many of the uses for profile info are in the code generator, which has its own IR. It is unfortunate that we can't use the same interfaces throughout the compiler, and I'm not entirely certain of the best thing to do for codegen. My current thought is to record the branch probabilities for each machine basic block and require codegen passes to keep those values updated. If necessary, we could define a version of the profile analysis pass that works on the codegen IR, but I'm not sure yet if that will be necessary.
I don't have a lot of time to devote to this right now, but I'm hoping that if we get a high-level plan (like this one) in place, we can start working toward it incrementally. As an initial step over the next few months, I would like to set up __builtin_expect to provide metadata for conditional branches, and then create a very basic profile info analysis pass to combine that metadata with simple static estimates. Contributions from others would be welcome.
Comments or suggestions?