I've been looking at the existing profile support in LLVM with
the intent of incorporating other sources of profile information.
In particular, I'm writing a tool that converts perf
(http://perf.wiki.kernel.org) data into a loadable LLVM profile.
First, I would like to describe what I understand about the
current setup. Please correct me, as I'm likely to be missing a
few things. I believe that we can extend the current framework to
support external profiles and traditional instrumentation.
The current profile support uses the instrumentation passes in
lib/Transforms/Instrumentation/*Profiling.cpp. The instrumented
code writes its output as a fixed-size record file with various
kinds of information: command line arguments,
function/block/edge and block tracing info. This is read into
integer vectors. These are then served via the class ProfileInfo
This information is then loaded with -profile-loader and passes
need to explicitly convert that to edge/block weights by
instantiating a ProfileInfo object.
Some questions I have about the existing setup:
What is the ProfileMetaDataLoaderPass? It seems as if it started
as a replacement for ProfileInfoLoader (based on a couple of TODO
notes I found). But it's not clear to me why and it seems to be
less used than ProfileInfo. It even has its own separate flag
(-profile-metadata-loader instead of -profile-loader).
This pass only fills in branch weights. It does not have all the
data used by ProfileInfo.
Is my understanding more or less correct? If so, I would like to
make three main changes:
1- Unify profile loading so that it only talks to the basic
analysis API. Here, I would like to change the way that
profile information is reflected in the program. Currently,
the compiler loads data from the profile file into 4 arrays
inside ProfileInfo. This is then reflected into the CFG for
the corresponding function.
Instead, I would like to reflect profile info as metadata in
the IR. For instance, functions seldom called in the profile
could be annotated with the 'cold' attribute in the IR. Block
frequency is more straightforward: we add the frequency data
to the first instruction of the block.
Things like that. The problem here is that different profiling
mechanisms will have different kinds of data. We will need
ways of translating those into metadata that is then served
from the analysis API. For example, how do we deal with value
For now, I'm focusing on the most common types of profilers.
These tend to use block/edge frequencies, which are more
straightforward to convert.
2- Passes should not need to be aware that profile
information exists. They would make the usual calls to the
analysis API, which will use profile data, if available.
With this, I want to make optimizers automatically make
better decisions when profile information exists.
3- Profile data should be flexible wrt code drift. This is
something that the profile loader should deal with. It
currently fails when the profile is out of sync with the shape
of the IR. In the presence of stale profile:
(a) Do not fail hard.
(b) Make conservative decisions with the annotations it
(c) Provide feedback to the user about the staleness of the
profile (e.g. % of samples/records dropped from the profile).
External profile sources can be added in two ways:
1- Separate tools that convert data from the external profile
into the format understood by lib/Analysis/ProfileInfoLoaderPass.cpp.
2- On-the-fly converters that are called by
There are pros and cons about both approaches. From a modularity
point of view, I am in favour of #1. This makes the compiler
completely independent from the profile provider.
For instance, in the case of Perf. Incorporating it into LLVM,
would mean bringing in the library that reads perf.data and an
executable reader (Perf data needs to be paired with the debug
information from the application binary).
Other sources of profile may even use proprietary data formats or
have other dependencies which would further bloat the compiler.
The profile format accepted by LLVM will need to evolve as new
types of profile are added. But as long as we keep it properly
documented and supported, it should not be a big deal.
One thing I'd like to add is a simplistic text-based profile,
with its converter in the LLVM tree. This could be used as an
example and as a unit test driver.
Does this sound like a workable plan? The converter for perf.data
I'm developing should be ready in the next couple of weeks. I
would like to get started with the changes I mention here as
well, but first I want to make sure this is the right direction.