RFC: Instrumentation based profiling file libraries

The frontend-driven instrumentation based profiling used for clang's
-fprofile-instr-generate and -fprofile-instr-use currently has logic for
handling its data format spread about a few different places:

1. Reading files is done in clang when -fprofile-instr-use is
   specified. The logic is in CodeGen/CodeGenPGO.cpp.

2. Reading files is done by the (very preliminary) llvm-profdata tool,
   which does this manually in llvm-profdata.c

3. Writing files is done in compiler-rt, which is invoked by clang's
   instrumentation when -fprofile-instr-generate is specified.

4. Writing files is done in the llvm-profdata tool, again by hand.

It would be nice to consolidate as much of this as possible into a
library, so that updating the file format and ensuring correctness are
easier.

We can fairly easily solve (1), (2), and (4) by moving the logic into an
LLVM library. I would like to do this soon as a first step and to
unblock dependent work.

- None of the current LLVM libraries seem appropriate, and there is
  precedent for adding simple libraries that do one thing, so a new
  library seems best.

- This library could either be (A) a standalone library for reading and
  writing the instrumentation based profiling format, or (B) a library
  that includes readers and writers for various profiling formats.
  Notably, (B) would make it a good place for a sample based profile
  reader, which currently lives in lib/Transforms with its usage.

- If we go with (A), a name like Profile may be too generic, so something
  more specific like InstrProfile might be better. For (B), Profile or
  ProfileData seem best.

The other part of the problem, (3), has no precedent that I'm aware of.
Is there a way to include llvm libraries in compiler-rt that wouldn't
cause problems? I don't plan on addressing this in the near future, but
comments on what options are available would be appreciated.

- This library could either be (A) a standalone library for reading and
  writing the instrumentation based profiling format, or (B) a library
  that includes readers and writers for various profiling formats.
  Notably, (B) would make it a good place for a sample based profile
  reader, which currently lives in lib/Transforms with its usage.

Agreed. Option (B) seems like the best alternative.

- If we go with (A), a name like Profile may be too generic, so something
  more specific like InstrProfile might be better. For (B), Profile or
  ProfileData seem best.

Either Profile or ProfileData sounds good to me. Slight preference for
ProfileData. I could move the reader code from
lib/Transforms/SampleProfile into this library, which could then be
used into standalone sample profile readers/validators/converters.

Where do you envision having the standalone tools? Say a converter
from one profile format to another, or a writer tool. Some tools will
have slightly more twisted dependencies (converting from Perf data,
for example requires quite a bit of other code).

Diego.

Diego Novillo <dnovillo@google.com> writes:

- This library could either be (A) a standalone library for reading and
  writing the instrumentation based profiling format, or (B) a library
  that includes readers and writers for various profiling formats.
  Notably, (B) would make it a good place for a sample based profile
  reader, which currently lives in lib/Transforms with its usage.

Agreed. Option (B) seems like the best alternative.

- If we go with (A), a name like Profile may be too generic, so something
  more specific like InstrProfile might be better. For (B), Profile or
  ProfileData seem best.

Either Profile or ProfileData sounds good to me. Slight preference for
ProfileData. I could move the reader code from
lib/Transforms/SampleProfile into this library, which could then be
used into standalone sample profile readers/validators/converters.

ProfileData sounds good.

Where do you envision having the standalone tools? Say a converter
from one profile format to another, or a writer tool. Some tools will
have slightly more twisted dependencies (converting from Perf data,
for example requires quite a bit of other code).

For now I think it makes sense to just treat these as other llvm tools.
That is, they can independently go in tools/, and depend on the profile
library and whatever else they need.