Adding sample profile support to llvm-profdata?

Duncan, Justin,

I’m about to submit a series of patches that add writing capabilities for sample profiles in both text and binary formats. Soon, I’ll add a third format (to make it interoperable with GCC).

I would like to add some profile maintenance utilities as well: merging, dumping and converting.

It seems like the best place would be tools/llvm-profdata. But that means that I need to have a way of distinguishing sample from instrumented profiles.

For the binary formats, it’s easy to have the tool check the magic bits at the start, but for the text format it is not easy to tell whether we’re dealing with a sample profile vs an instrumented profile.

The options I see are:

1- Add a --profile-type={sample|instr} to llvm-profdata to specify whether we are dealing with a sample or an instrumented profile. This would help prevent mixing and matching the two types of profiles (they are not convertible one to the other, not easily anyway).

2- Write a totally separate tool to deal with sample profiles.

I am slightly in favour of option #1. I could even make --profile-type=instr to avoid having a flag day for tools you may have deployed.

Thanks. Diego.

IIRC, the text format is for making tests "sane" -- requiring an
explicit format there seems fine to me.

It's nice to just have one tool. Will there actually be shared
code paths, though?

Diego Novillo <dnovillo@google.com> writes:

Duncan, Justin,

I'm about to submit a series of patches that add writing capabilities for
sample profiles in both text and binary formats. Soon, I'll add a third format
(to make it interoperable with GCC).

I would like to add some profile maintenance utilities as well: merging,
dumping and converting.

It seems like the best place would be tools/llvm-profdata. But that means that
I need to have a way of distinguishing sample from instrumented profiles.

For the binary formats, it's easy to have the tool check the magic bits at the
start, but for the text format it is not easy to tell whether we're dealing
with a sample profile vs an instrumented profile.

The options I see are:

1- Add a --profile-type={sample|instr} to llvm-profdata to specify whether we
are dealing with a sample or an instrumented profile. This would help prevent
mixing and matching the two types of profiles (they are not convertible one to
the other, not easily anyway).

2- Write a totally separate tool to deal with sample profiles.

There's also:

3- Make the text formats distinguishable. The text based instrprof
   format is really only used for testing, so I don't care too much if
   we make it require something distinguishable in the first line.

I am slightly in favour of option #1. I could even make --profile-type=instr
to avoid having a flag day for tools you may have deployed.

I also prefer #1 to #2. If we do that, I think the flag should be
required if the input can't be autodetected (ie, for any text format),
but optional for the binary formats.

IIRC, the text format is for making tests “sane” – requiring an
explicit format there seems fine to me.

Right.

It’s nice to just have one tool. Will there actually be shared
code paths, though?

Likely. I’ve got writer and reader classes that are similar to the InstrProfReader and InstrProfWriter classes. I could add a common base for both, but that may be too much.

Diego.

SGTM.