RFC - Extending ProfileInfo to support external profiles

I've been looking at the existing profile support in LLVM with
the intent of incorporating other sources of profile information.
In particular, I'm writing a tool that converts perf
(http://perf.wiki.kernel.org) data into a loadable LLVM profile.

First, I would like to describe what I understand about the
current setup. Please correct me, as I'm likely to be missing a
few things. I believe that we can extend the current framework to
support external profiles and traditional instrumentation.

The current profile support uses the instrumentation passes in
lib/Transforms/Instrumentation/*Profiling.cpp. The instrumented
code writes its output as a fixed-size record file with various
kinds of information: command line arguments,
function/block/edge and block tracing info. This is read into
integer vectors. These are then served via the class ProfileInfo
API.

This information is then loaded with -profile-loader and passes
need to explicitly convert that to edge/block weights by
instantiating a ProfileInfo object.

Some questions I have about the existing setup:

What is the ProfileMetaDataLoaderPass? It seems as if it started
as a replacement for ProfileInfoLoader (based on a couple of TODO
notes I found). But it's not clear to me why and it seems to be
less used than ProfileInfo. It even has its own separate flag
(-profile-metadata-loader instead of -profile-loader).

This pass only fills in branch weights. It does not have all the
data used by ProfileInfo.

Is my understanding more or less correct? If so, I would like to
make three main changes:

1- Unify profile loading so that it only talks to the basic
   analysis API. Here, I would like to change the way that
   profile information is reflected in the program. Currently,
   the compiler loads data from the profile file into 4 arrays
   inside ProfileInfo. This is then reflected into the CFG for
   the corresponding function.

   Instead, I would like to reflect profile info as metadata in
   the IR. For instance, functions seldom called in the profile
   could be annotated with the 'cold' attribute in the IR. Block
   frequency is more straightforward: we add the frequency data
   to the first instruction of the block.

   Things like that. The problem here is that different profiling
   mechanisms will have different kinds of data. We will need
   ways of translating those into metadata that is then served
   from the analysis API. For example, how do we deal with value
   profilers?

   For now, I'm focusing on the most common types of profilers.
   These tend to use block/edge frequencies, which are more
   straightforward to convert.

2- Passes should not need to be aware that profile
   information exists. They would make the usual calls to the
   analysis API, which will use profile data, if available.

   With this, I want to make optimizers automatically make
   better decisions when profile information exists.

3- Profile data should be flexible wrt code drift. This is
   something that the profile loader should deal with. It
   currently fails when the profile is out of sync with the shape
   of the IR. In the presence of stale profile:
   (a) Do not fail hard.
   (b) Make conservative decisions with the annotations it
       generates.
   (c) Provide feedback to the user about the staleness of the
       profile (e.g. % of samples/records dropped from the profile).

External profile sources can be added in two ways:

1- Separate tools that convert data from the external profile
   into the format understood by lib/Analysis/ProfileInfoLoaderPass.cpp.

2- On-the-fly converters that are called by
   lib/Analysis/ProfileInfoLoaderPass.cpp.

There are pros and cons about both approaches. From a modularity
point of view, I am in favour of #1. This makes the compiler
completely independent from the profile provider.

For instance, in the case of Perf. Incorporating it into LLVM,
would mean bringing in the library that reads perf.data and an
executable reader (Perf data needs to be paired with the debug
information from the application binary).

Other sources of profile may even use proprietary data formats or
have other dependencies which would further bloat the compiler.

The profile format accepted by LLVM will need to evolve as new
types of profile are added. But as long as we keep it properly
documented and supported, it should not be a big deal.

One thing I'd like to add is a simplistic text-based profile,
with its converter in the LLVM tree. This could be used as an
example and as a unit test driver.

Does this sound like a workable plan? The converter for perf.data
I'm developing should be ready in the next couple of weeks. I
would like to get started with the changes I mention here as
well, but first I want to make sure this is the right direction.

Thanks. Diego.

Is my understanding more or less correct?

Probably for all I know. I'm not aware of anyone actively maintaining or
keeping these pieces working. AFAICT, there have been 3-4 projects that
started hacking on this and didn't finish, and frankly the result is a
mess. It's not clear than any of it really works today.

If so, I would like to
make three main changes:

1- Unify profile loading so that it only talks to the basic
   analysis API. Here, I would like to change the way that
   profile information is reflected in the program. Currently,
   the compiler loads data from the profile file into 4 arrays
   inside ProfileInfo. This is then reflected into the CFG for
   the corresponding function.

   Instead, I would like to reflect profile info as metadata in
   the IR. For instance, functions seldom called in the profile
   could be annotated with the 'cold' attribute in the IR. Block
   frequency is more straightforward: we add the frequency data
   to the first instruction of the block.

   Things like that. The problem here is that different profiling
   mechanisms will have different kinds of data. We will need
   ways of translating those into metadata that is then served
   from the analysis API. For example, how do we deal with value
   profilers?

   For now, I'm focusing on the most common types of profilers.
   These tend to use block/edge frequencies, which are more
   straightforward to convert.

I really think you should ignore the existing profile support in LLVM. ALl
of it. I actually think we should remove all of it, but that can happen
later.

I would build whatever passes and infrastructure you need in order to load
your profile data, in whatever format makes sense, and annotate the IR with
metadata. I would just build them from scratch. I don't think there is
anything to be gained from trying toclean up the existing profiling
infrastructure when you'll end up re-using almost none of it, you will have
different constraints when dealing with profiles from sampling tools, and
will still have a common IR-level interface in the metadata.

2- Passes should not need to be aware that profile
   information exists. They would make the usual calls to the
   analysis API, which will use profile data, if available.

   With this, I want to make optimizers automatically make
   better decisions when profile information exists.

I believe this has already been done. I'm not aware of any optimization
passes in use today that will need changes here.

3- Profile data should be flexible wrt code drift. This is
   something that the profile loader should deal with. It
   currently fails when the profile is out of sync with the shape
   of the IR. In the presence of stale profile:
   (a) Do not fail hard.
   (b) Make conservative decisions with the annotations it
       generates.
   (c) Provide feedback to the user about the staleness of the
       profile (e.g. % of samples/records dropped from the profile).

Yes, in designing your own (from the ground up) loader, these will be
important issues.

Is my understanding more or less correct?

Probably for all I know. I'm not aware of anyone actively maintaining or
keeping these pieces working. AFAICT, there have been 3-4 projects that
started hacking on this and didn't finish, and frankly the result is a mess.
It's not clear than any of it really works today.

OK. I was trying to use the instrumentation support and I could not
really get it to work. Clang does not understand the flags, and trying
to link the runtimes do not seem to get built by default (not a cmake
user, so I'm not sure how to build the library in runtime/libprofile).
So, I gave up on figuring it out.

I really think you should ignore the existing profile support in LLVM. ALl
of it. I actually think we should remove all of it, but that can happen
later.

OK. Works for me. I add a new pass in lib/Analysis then?

2- Passes should not need to be aware that profile
   information exists. They would make the usual calls to the
   analysis API, which will use profile data, if available.

   With this, I want to make optimizers automatically make
   better decisions when profile information exists.

I believe this has already been done. I'm not aware of any optimization
passes in use today that will need changes here.

Odd, because in lib/Transforms I see several instances of:

  ProfileInfo *PI = getAnalysis*<ProfileInfo>();

followed by calls to fixup the profile information. My thinking is
that we would hide all these inside the analysis API. Passes would ask
for, say, scalar evolution, which, in turn, will just use profile
data, if available.

Diego.

I haven’t look at the profiling support recently. But this doesn’t quite match my understanding. Bob, do you have comments?

Evan

Sent from my iPad

I haven’t look at the profiling support recently. But this doesn’t quite match my understanding. Bob, do you have comments?

Evan

I agree with Chandler. The metadata-level branch weights, block frequencies, etc. are all reasonable, but the profiling stuff is pretty rough. It has been useful for some research projects, and we’ve been able to use it for some experiments but I don’t think it makes sense to use it at all for what Diego is proposing.

Sent from my iPad

I haven’t look at the profiling support recently. But this doesn’t quite match my understanding. Bob, do you have comments?

Evan

I agree with Chandler. The metadata-level branch weights, block frequencies, etc. are all reasonable, but the profiling stuff is pretty rough. It has been useful for some research projects, and we’ve been able to use it for some experiments but I don’t think it makes sense to use it at all for what Diego is proposing.

Ok, I see what you guys are referring to. Sorry for the noise.

Evan