The state of IRPGO (3 remaining work items)

_sean_silva · May 24, 2016, 3:23am

Jake and I have been integrating IRPGO on PS4, and we’ve identified 3 remaining work items.

Driver changes

We’d like to make IRPGO the default on PS4. We also think that it would be beneficial to make IRPGO the default PGO on all platforms (coverage would continue to use FE instr as it does currently, of course). In previous conversations (e.g. http://reviews.llvm.org/D15829) it has come up that Apple have requirements that would prevent them from moving to IRPGO as the default PGO, at least without a deprecation period of one or two releases.

I’d like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all platforms except Apple platforms. I really don’t like fragmenting things like this (e.g. if a third-party tests “clang’s” PGO they will get something different depending on the platform), but I don’t see another way given Apple’s constraints.

Pre-instrumentation passes

Pre-instrumentation optimization has been critical for reducing the overhead of PGO for the PS4 games we tested (as expected). However, in our measurements (and we are glad to provide more info) the main benefit was inlining (also as expected). A simple pass of inlining at threshold 100 appeared to give all the benefits. Even inlining at threshold 0 gave almost all the benefits. For example, the passes initially proposed in http://reviews.llvm.org/D15828 did not improve over just inlining with threshold 100.

(due to PR27299 we also need to add simplifycfg after inlining to clean up, but this doesn’t affect the instrumentation overhead in our measurements)

Bottom line: for our use cases, inlining does all the work, but we’re not opposed to having more passes, which might be beneficial for non-game workloads (which is most code).

Warnings

We identified 3 classes of issues which manifest as spammy warnings when applying profile data with IRPGO (these affect FEPGO also I believe, but we looked in depth at IRPGO):

The main concerning one is that getPGOFuncName mangles the filename into the counter name. This causes us to get instrprof_error::unknown_function when the pgo-use build is done in a different build directory from the training build (which is a reasonable thing to support). In this situation, PGO data is useless for all static functions (and as a byproduct results in a huge volume of warnings).
In different TU’s, pre-instr inlining might make different inlining decisions (for example, different functions may be available for inlining), causing hash mismatch errors (instrprof_error::hash_mismatch). In building a large game, we only saw 8 instance of this, so it is not as severe as 1, but would be good to fix.
A .cpp file may be compiled and put into an archive, but then not selected by the linker and will therefore not result in a counter in the profraw. When compiling this file with pgo-use, instrprof_error::unknown_function will result and a warning will be emitted.

Case 1 can be fixed using a function hash or other unique identifier instead of a file path. David, in D20195 you mentioned that Rong was working on a patch that would fix 2; we are looking forward to that.

For 3, I unfortunately do not know of any solution. I don’t think there is a way for us to make this warning reliable in the face of this circumstance. So my conclusion is that instrprof_error::unknown_function at least must be defaulted to off unfortunately.

– Sean Silva

davidxl · May 24, 2016, 3:56am

Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.

Sean, thanks for the write up. It matches very well with what we think as
well.

- Driver changes

We'd like to make IRPGO the default on PS4. We also think that it would be
beneficial to make IRPGO the default PGO on all platforms (coverage would
continue to use FE instr as it does currently, of course). In previous
conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has come up that
Apple have requirements that would prevent them from moving to IRPGO as the
default PGO, at least without a deprecation period of one or two releases.

I'd like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms. I really don't like fragmenting things
like this (e.g. if a third-party tests "clang's" PGO they will get
something different depending on the platform), but I don't see another way
given Apple's constraints.

I'd like to see IRPGO to be the default as well, but the first thing we
need is a driver level option to make the switch (prof-gen) -- currently we
rely on -Xclang option to switch between two modes, which is less than
ideal.

If the concern from Apple is that the old profile still need to work, then
this is problem already solved. The reason is that -fprofile-instr-use can
automatically detect the type of the profile and switch the mode.

- Pre-instrumentation passes

Pre-instrumentation optimization has been critical for reducing the
overhead of PGO for the PS4 games we tested (as expected). However, in our
measurements (and we are glad to provide more info) the main benefit was
inlining (also as expected). A simple pass of inlining at threshold 100
appeared to give all the benefits. Even inlining at threshold 0 gave almost
all the benefits. For example, the passes initially proposed in
⚙ D15828 [PGO] Passmanagerbuilder change that enable IR level PGO instrumentation did not improve over just inlining with
threshold 100.

(due to PR27299 we also need to add simplifycfg after inlining to clean
up, but this doesn't affect the instrumentation overhead in our
measurements)

Bottom line: for our use cases, inlining does all the work, but we're not
opposed to having more passes, which might be beneficial for non-game
workloads (which is most code).

Yes, Rong is re-collecting performance data before submitting the patch.

- Warnings

We identified 3 classes of issues which manifest as spammy warnings when
applying profile data with IRPGO (these affect FEPGO also I believe, but we
looked in depth at IRPGO):

1. The main concerning one is that getPGOFuncName mangles the filename
into the counter name. This causes us to get
instrprof_error::unknown_function when the pgo-use build is done in a
different build directory from the training build (which is a reasonable
thing to support). In this situation, PGO data is useless for all `static`
functions (and as a byproduct results in a huge volume of warnings).

This can be enhanced with an user option to override the behavior. Can you
help filing a tracking bug?

2. In different TU's, pre-instr inlining might make different inlining
decisions (for example, different functions may be available for inlining),
causing hash mismatch errors (instrprof_error::hash_mismatch). In building
a large game, we only saw 8 instance of this, so it is not as severe as 1,
but would be good to fix.

Rong has a patch addressing that -- will submit after cleanup pass change
is done.

3. A .cpp file may be compiled and put into an archive, but then not
selected by the linker and will therefore not result in a counter in the
profraw. When compiling this file with pgo-use,
instrprof_error::unknown_function will result and a warning will be emitted.

yes -- this is a common problem to other compilers as well.

Case 1 can be fixed using a function hash or other unique identifier
instead of a file path. David, in D20195 you mentioned that Rong was
working on a patch that would fix 2; we are looking forward to that.

Right.

For 3, I unfortunately do not know of any solution. I don't think there is
a way for us to make this warning reliable in the face of this
circumstance. So my conclusion is that instrprof_error::unknown_function at
least must be defaulted to off unfortunately.

yes, this can be annoying. If the warnings can be buffered, then the
compiler can check if this is due to missing profile for the whole file and
can reduce the warnings into one single warning (source file has no profile
data). Making it off by default sounds fine to me too if it is too noisy.

thanks,

David

vedantk · May 24, 2016, 10:41pm

Jake and I have been integrating IRPGO on PS4, and we've identified 3 remaining work items.

Sean, thanks for the write up. It matches very well with what we think as well.

+ 1

- Driver changes

We'd like to make IRPGO the default on PS4. We also think that it would be beneficial to make IRPGO the default PGO on all platforms (coverage would continue to use FE instr as it does currently, of course). In previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has come up that Apple have requirements that would prevent them from moving to IRPGO as the default PGO, at least without a deprecation period of one or two releases.

Sean pointed out the problematic scenario in D15829 (in plan "C"):

All existing user workflows continue to work, except for workflows that attempt to llvm-profdata merge some old frontend profile data (e.g. they have checked-in to version control and represents some special workload) with the profile data from new binaries.

We can address this issue by (1) making sure llvm-profdata emits a helpful warning when merging an FE-based profile with an IR-based one, and (2) keeping an option to use FE instrumentation for PGO. Having (2) helps people who can't (or don't want) to switch to IR PGO.

I'd like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all platforms except Apple platforms.

I'd really rather not introduce this inconsistency. I'm worried that it might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC'd) is following up with some of our internal users to check if we can change the default behavior of -fprofile-instr-generate. He should be able to chime in on this soon.

At its core I don't think -fprofile-instr-generate *implies* FE-based instrumentation. So, I'd like to see the driver do this (on all platforms):

  * -fprofile-instr-generate: IR instrumentation
  * -fprofile-instr-generate=IR: IR instrumentation
  * -fprofile-instr-generate=FE: FE instrumentation
  * -fprofile-instr-generate -fcoverage-mapping: FE + coverage instrumentation

It's a bit ugly because the meaning of -fprofile-instr-generate becomes context-sensitive. But, (1) it doesn't break existing common workflows and (2) it makes it easier to ship IRPGO. The big caveat here is that we'll need to wait a bit and see if our internal users are OK with this.

One alternative is to introduce a separate driver flag for IRPGO. This might not work well for Sony's existing users. I'd be interested in any feedback about this approach.

I really don't like fragmenting things like this (e.g. if a third-party tests "clang's" PGO they will get something different depending on the platform), but I don't see another way given Apple's constraints.

I'd like to see IRPGO to be the default as well, but the first thing we need is a driver level option to make the switch (prof-gen) -- currently we rely on -Xclang option to switch between two modes, which is less than ideal.

If the concern from Apple is that the old profile still need to work, then this is problem already solved. The reason is that -fprofile-instr-use can automatically detect the type of the profile and switch the mode.

It's not just that. As Sean pointed out, we're concerned about old profiles inter-operating poorly with new ones.

thanks,
vedant

dexonsmith · May 24, 2016, 10:50pm

Zooming into the command-line option bike-shed:

At its core I don't think -fprofile-instr-generate *implies* FE-based instrumentation. So, I'd like to see the driver do this (on all platforms):

* -fprofile-instr-generate: IR instrumentation
* -fprofile-instr-generate=IR: IR instrumentation
* -fprofile-instr-generate=FE: FE instrumentation
* -fprofile-instr-generate -fcoverage-mapping: FE + coverage instrumentation

I feel like this would be simpler:
* -fcoverage-mapping: -fprofile-instr-generate=FE + coverage instrumentation

Maybe there's a downside I'm not seeing though?

Also, I don't like "FE". Maybe "source"? And instead of "IR", "llvm-ir" or something?

It's a bit ugly because the meaning of -fprofile-instr-generate becomes context-sensitive. But, (1) it doesn't break existing common workflows and (2) it makes it easier to ship IRPGO. The big caveat here is that we'll need to wait a bit and see if our internal users are OK with this.

One alternative is to introduce a separate driver flag for IRPGO. This might not work well for Sony's existing users. I'd be interested in any feedback about this approach.

My first thought is `-mprofile-instr-generate`, since if it's not in the frontend then "-f" doesn't really make sense...

davidxl · May 24, 2016, 11:10pm

Zooming into the command-line option bike-shed:

>
> At its core I don't think -fprofile-instr-generate *implies* FE-based
instrumentation. So, I'd like to see the driver do this (on all platforms):
>
> * -fprofile-instr-generate: IR instrumentation
> * -fprofile-instr-generate=IR: IR instrumentation
> * -fprofile-instr-generate=FE: FE instrumentation
> * -fprofile-instr-generate -fcoverage-mapping: FE + coverage
instrumentation

I feel like this would be simpler:
* -fcoverage-mapping: -fprofile-instr-generate=FE + coverage
instrumentation

Maybe there's a downside I'm not seeing though?

I proposed the same change in proposal B in the review thread.

B. Proposed new behavior:
-fprofile-instr-generate turns on IR late instrumentation
-fcoverage-mapping turns on FE instrumentation and coverage-mapping
-fprofile-instr-generate -fcoverage-mapping result in compiler warning
-fprofile-instr-use=<> will automatically determine how to use the

The upside is that -fcoverage-mapping itself does not do anything by itself
today. This change will simplify its usage (without user specifying
-fprofile-instr-generate)

The downside Sean mentioned is that this changes the existing behavior of
-fcoverage-mapping which can be a surprise to users (though I wonder why
would a user depend on this old behavior).

Also, I don't like "FE". Maybe "source"? And instead of "IR", "llvm-ir"
or something?

Perhaps clang vs LLVM (similar to the cc1 option we have). The downside is
'Clang' is clearly tied to Clang, but not any other FEs.

David

_sean_silva · May 24, 2016, 11:51pm

Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.

Sean, thanks for the write up. It matches very well with what we think as
well.

- Driver changes

We'd like to make IRPGO the default on PS4. We also think that it would
be beneficial to make IRPGO the default PGO on all platforms (coverage
would continue to use FE instr as it does currently, of course). In
previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has come
up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

I'd like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms. I really don't like fragmenting things
like this (e.g. if a third-party tests "clang's" PGO they will get
something different depending on the platform), but I don't see another way
given Apple's constraints.

I'd like to see IRPGO to be the default as well, but the first thing we
need is a driver level option to make the switch (prof-gen) -- currently we
rely on -Xclang option to switch between two modes, which is less than
ideal.

If the concern from Apple is that the old profile still need to work, then
this is problem already solved. The reason is that -fprofile-instr-use can
automatically detect the type of the profile and switch the mode.

Yes, since -fprofile-instr-use can automatically detect, the only issue is
merging with old profiles. The simplest solution is to add a driver flag
-fuse-the-old-pgo-because-i-want-to-merge-with-old-profiles (name is a
bikeshed) which temporarily allows users to continue to use FE PGO. I don't
see a reason why we would want FEPGO in the long term; we can always use
IRPGO with no pre-instrumentation passes to get the same effect.

- Pre-instrumentation passes

Pre-instrumentation optimization has been critical for reducing the
overhead of PGO for the PS4 games we tested (as expected). However, in our
measurements (and we are glad to provide more info) the main benefit was
inlining (also as expected). A simple pass of inlining at threshold 100
appeared to give all the benefits. Even inlining at threshold 0 gave almost
all the benefits. For example, the passes initially proposed in
⚙ D15828 [PGO] Passmanagerbuilder change that enable IR level PGO instrumentation did not improve over just inlining with
threshold 100.

(due to PR27299 we also need to add simplifycfg after inlining to clean
up, but this doesn't affect the instrumentation overhead in our
measurements)

Bottom line: for our use cases, inlining does all the work, but we're not
opposed to having more passes, which might be beneficial for non-game
workloads (which is most code).

Yes, Rong is re-collecting performance data before submitting the patch.

Great!

- Warnings

We identified 3 classes of issues which manifest as spammy warnings when
applying profile data with IRPGO (these affect FEPGO also I believe, but we
looked in depth at IRPGO):

1. The main concerning one is that getPGOFuncName mangles the filename
into the counter name. This causes us to get
instrprof_error::unknown_function when the pgo-use build is done in a
different build directory from the training build (which is a reasonable
thing to support). In this situation, PGO data is useless for all `static`
functions (and as a byproduct results in a huge volume of warnings).

This can be enhanced with an user option to override the behavior. Can you
help filing a tracking bug?

Sure: https://llvm.org/bugs/show_bug.cgi?id=27867

2. In different TU's, pre-instr inlining might make different inlining
decisions (for example, different functions may be available for inlining),
causing hash mismatch errors (instrprof_error::hash_mismatch). In building
a large game, we only saw 8 instance of this, so it is not as severe as 1,
but would be good to fix.

Rong has a patch addressing that -- will submit after cleanup pass change
is done.

Great!

3. A .cpp file may be compiled and put into an archive, but then not
selected by the linker and will therefore not result in a counter in the
profraw. When compiling this file with pgo-use,
instrprof_error::unknown_function will result and a warning will be emitted.

yes -- this is a common problem to other compilers as well.

Case 1 can be fixed using a function hash or other unique identifier
instead of a file path. David, in D20195 you mentioned that Rong was
working on a patch that would fix 2; we are looking forward to that.

Right.

For 3, I unfortunately do not know of any solution. I don't think there
is a way for us to make this warning reliable in the face of this
circumstance. So my conclusion is that instrprof_error::unknown_function at
least must be defaulted to off unfortunately.

yes, this can be annoying. If the warnings can be buffered, then the
compiler can check if this is due to missing profile for the whole file and
can reduce the warnings into one single warning (source file has no profile
data). Making it off by default sounds fine to me too if it is too noisy.

That might work, but would be tricky since there might be linkonce_odr
functions also in the translation unit and so not all functions would be
missing profile counters.

-- Sean Silva

_sean_silva · May 24, 2016, 11:55pm

Zooming into the command-line option bike-shed:

>
> At its core I don't think -fprofile-instr-generate *implies* FE-based
instrumentation. So, I'd like to see the driver do this (on all platforms):
>
> * -fprofile-instr-generate: IR instrumentation
> * -fprofile-instr-generate=IR: IR instrumentation
> * -fprofile-instr-generate=FE: FE instrumentation
> * -fprofile-instr-generate -fcoverage-mapping: FE + coverage
instrumentation

I feel like this would be simpler:
* -fcoverage-mapping: -fprofile-instr-generate=FE + coverage
instrumentation

Maybe there's a downside I'm not seeing though?

I proposed the same change in proposal B in the review thread.

B. Proposed new behavior:
-fprofile-instr-generate turns on IR late instrumentation
-fcoverage-mapping turns on FE instrumentation and coverage-mapping
-fprofile-instr-generate -fcoverage-mapping result in compiler warning
-fprofile-instr-use=<> will automatically determine how to use the

The upside is that -fcoverage-mapping itself does not do anything by
itself today. This change will simplify its usage (without user specifying
-fprofile-instr-generate)

The downside Sean mentioned is that this changes the existing behavior of
-fcoverage-mapping which can be a surprise to users (though I wonder why
would a user depend on this old behavior).

I didn't really intend to say that it was a "downside" per se. I was just
pointing it out that it is a separable step and orthogonal to the
discussion. I agree that it would simplify things for users; we already
prohibit -fcoverage-mapping by itself, so we are strictly expanding what we
accept.

-- Sean Silva

_sean_silva · May 25, 2016, 12:21am

>
> Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.
>
> Sean, thanks for the write up. It matches very well with what we think
as well.

+ 1

> - Driver changes
>
> We'd like to make IRPGO the default on PS4. We also think that it would
be beneficial to make IRPGO the default PGO on all platforms (coverage
would continue to use FE instr as it does currently, of course). In
previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has come
up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

Sean pointed out the problematic scenario in D15829 (in plan "C"):
All existing user workflows continue to work, except for workflows that
attempt to llvm-profdata merge some old frontend profile data (e.g. they
have checked-in to version control and represents some special workload)
with the profile data from new binaries.
We can address this issue by (1) making sure llvm-profdata emits a helpful
warning when merging an FE-based profile with an IR-based one, and (2)
keeping an option to use FE instrumentation for PGO. Having (2) helps
people who can't (or don't want) to switch to IR PGO.

> I'd like to get consensus on a path forward.
> As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms.

I'd really rather not introduce this inconsistency. I'm worried that it
might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC'd) is following up with some of our internal users to check if we
can change the default behavior of -fprofile-instr-generate. He should be
able to chime in on this soon.

Great!

At its core I don't think -fprofile-instr-generate *implies* FE-based
instrumentation. So, I'd like to see the driver do this (on all platforms):

  * -fprofile-instr-generate: IR instrumentation
  * -fprofile-instr-generate=IR: IR instrumentation
  * -fprofile-instr-generate=FE: FE instrumentation
  * -fprofile-instr-generate -fcoverage-mapping: FE + coverage
instrumentation

It's a bit ugly because the meaning of -fprofile-instr-generate becomes
context-sensitive. But, (1) it doesn't break existing common workflows and
(2) it makes it easier to ship IRPGO. The big caveat here is that we'll
need to wait a bit and see if our internal users are OK with this.

Is there a reason to even have the possibility for FEPGO in the long run?

From what I can tell, at most we would add a -fuse-the-old-pgo-because-i-

want-to-merge-with-old-profiles option to hold people over until they can
regenerate their profiles with the current compiler. We can add a flag to
control what pre-instrumentation is done to retain the source-level
robustness of FEPGO (e.g. -fpgo-no-simplify-before-instrumenting or
something).

One alternative is to introduce a separate driver flag for IRPGO. This
might not work well for Sony's existing users. I'd be interested in any
feedback about this approach.

Personally, I would prefer to maintaining command line compatibility for
PGO in Clang (i.e. users don't have to modify their build systems).

-- Sean Silva

_sean_silva · May 25, 2016, 12:23am

Zooming into the command-line option bike-shed:

Let's avoid bikeshedding until the exact requirements are clear.

-- Sean Silva

davidxl · May 25, 2016, 4:22pm

It sounds to me we are likely to converge on the following:

Making IR/llvm based PGO the default;
Enhance -fcoverage-mapping such that it automatically turns on FE based instrumentation
if -fcoverage-mapping is used together with -fprofile-instr-generate, -fcoverage-mapping serves as a switch to turn on FE based instrumetnation

All the above are transparent to users.

The following are for advanced usage:

have a new option to explicitly switch instrumentation flavor to be FE based
have a new option to turn off part of pre-instrumentation cleanup/simplification passes for users who want very stable profile for stable library sources *

4 and 5 serves the same purpose so 5 may not be necessary.

thanks,

_sean_silva · May 25, 2016, 7:12pm

Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.

- Driver changes

We'd like to make IRPGO the default on PS4. We also think that it would be
beneficial to make IRPGO the default PGO on all platforms (coverage would
continue to use FE instr as it does currently, of course). In previous
conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has come up that
Apple have requirements that would prevent them from moving to IRPGO as the
default PGO, at least without a deprecation period of one or two releases.

I'd like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms. I really don't like fragmenting things
like this (e.g. if a third-party tests "clang's" PGO they will get
something different depending on the platform), but I don't see another way
given Apple's constraints.

- Pre-instrumentation passes

Pre-instrumentation optimization has been critical for reducing the
overhead of PGO for the PS4 games we tested (as expected). However, in our
measurements (and we are glad to provide more info) the main benefit was
inlining (also as expected). A simple pass of inlining at threshold 100
appeared to give all the benefits. Even inlining at threshold 0 gave almost
all the benefits. For example, the passes initially proposed in
⚙ D15828 [PGO] Passmanagerbuilder change that enable IR level PGO instrumentation did not improve over just inlining with
threshold 100.

(due to PR27299 we also need to add simplifycfg after inlining to clean
up, but this doesn't affect the instrumentation overhead in our
measurements)

Bottom line: for our use cases, inlining does all the work, but we're not
opposed to having more passes, which might be beneficial for non-game
workloads (which is most code).

- Warnings

We identified 3 classes of issues which manifest as spammy warnings when
applying profile data with IRPGO (these affect FEPGO also I believe, but we
looked in depth at IRPGO):

1. The main concerning one is that getPGOFuncName mangles the filename
into the counter name. This causes us to get
instrprof_error::unknown_function when the pgo-use build is done in a
different build directory from the training build (which is a reasonable
thing to support). In this situation, PGO data is useless for all `static`
functions (and as a byproduct results in a huge volume of warnings).

Rong, I was just looking at implementing a fix for this, but noticed
something. Can we get rid of the "InLTO" argument to getPGOFuncName if we
unconditionally apply the funcname metadata to all functions?

-- Sean Silva

davidxl · May 25, 2016, 9:17pm

Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.

- Driver changes

We'd like to make IRPGO the default on PS4. We also think that it would
be beneficial to make IRPGO the default PGO on all platforms (coverage
would continue to use FE instr as it does currently, of course). In
previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has come
up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

I'd like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms. I really don't like fragmenting things
like this (e.g. if a third-party tests "clang's" PGO they will get
something different depending on the platform), but I don't see another way
given Apple's constraints.

- Pre-instrumentation passes

Pre-instrumentation optimization has been critical for reducing the
overhead of PGO for the PS4 games we tested (as expected). However, in our
measurements (and we are glad to provide more info) the main benefit was
inlining (also as expected). A simple pass of inlining at threshold 100
appeared to give all the benefits. Even inlining at threshold 0 gave almost
all the benefits. For example, the passes initially proposed in
⚙ D15828 [PGO] Passmanagerbuilder change that enable IR level PGO instrumentation did not improve over just inlining with
threshold 100.

(due to PR27299 we also need to add simplifycfg after inlining to clean
up, but this doesn't affect the instrumentation overhead in our
measurements)

Bottom line: for our use cases, inlining does all the work, but we're not
opposed to having more passes, which might be beneficial for non-game
workloads (which is most code).

- Warnings

We identified 3 classes of issues which manifest as spammy warnings when
applying profile data with IRPGO (these affect FEPGO also I believe, but we
looked in depth at IRPGO):

1. The main concerning one is that getPGOFuncName mangles the filename
into the counter name. This causes us to get
instrprof_error::unknown_function when the pgo-use build is done in a
different build directory from the training build (which is a reasonable
thing to support). In this situation, PGO data is useless for all `static`
functions (and as a byproduct results in a huge volume of warnings).

Rong, I was just looking at implementing a fix for this, but noticed
something. Can we get rid of the "InLTO" argument to getPGOFuncName if we
unconditionally apply the funcname metadata to all functions?

Are you proposing always emitting the meta data for internal functions?

David

_sean_silva · May 25, 2016, 11:34pm

Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.

- Driver changes

We'd like to make IRPGO the default on PS4. We also think that it would
be beneficial to make IRPGO the default PGO on all platforms (coverage
would continue to use FE instr as it does currently, of course). In
previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has
come up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

I'd like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms. I really don't like fragmenting things
like this (e.g. if a third-party tests "clang's" PGO they will get
something different depending on the platform), but I don't see another way
given Apple's constraints.

- Pre-instrumentation passes

Pre-instrumentation optimization has been critical for reducing the
overhead of PGO for the PS4 games we tested (as expected). However, in our
measurements (and we are glad to provide more info) the main benefit was
inlining (also as expected). A simple pass of inlining at threshold 100
appeared to give all the benefits. Even inlining at threshold 0 gave almost
all the benefits. For example, the passes initially proposed in
⚙ D15828 [PGO] Passmanagerbuilder change that enable IR level PGO instrumentation did not improve over just inlining with
threshold 100.

(due to PR27299 we also need to add simplifycfg after inlining to clean
up, but this doesn't affect the instrumentation overhead in our
measurements)

Bottom line: for our use cases, inlining does all the work, but we're
not opposed to having more passes, which might be beneficial for non-game
workloads (which is most code).

- Warnings

We identified 3 classes of issues which manifest as spammy warnings when
applying profile data with IRPGO (these affect FEPGO also I believe, but we
looked in depth at IRPGO):

1. The main concerning one is that getPGOFuncName mangles the filename
into the counter name. This causes us to get
instrprof_error::unknown_function when the pgo-use build is done in a
different build directory from the training build (which is a reasonable
thing to support). In this situation, PGO data is useless for all `static`
functions (and as a byproduct results in a huge volume of warnings).

Rong, I was just looking at implementing a fix for this, but noticed
something. Can we get rid of the "InLTO" argument to getPGOFuncName if we
unconditionally apply the funcname metadata to all functions?

Are you proposing always emitting the meta data for internal functions?

Yes; actually for all functions. If I understand correctly, the InLTO
argument fixes a specific case of a general problem: the PGOFuncName is
currently constructed via an algorithm that depends on the current state of
the module and the result can therefore change as the module is transformed.
To solve this, the idea is to only run the algorithm at a single point of
truth (the module as seen by prof-gen/prof-use) and freeze the result in
metadata. All other accesses to PGOFuncName simply read that metadata, so
there is no possibility for getting out of date.

Currently, it seems the meaning of the InLTO flag is basically "we are
accessing the PGOFuncName at a point that is not prof-gen/prof-use, so use
the metadata". I just want to make that clearer.

-- Sean Silva

davidxl · May 26, 2016, 12:13am

Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.

- Driver changes

We'd like to make IRPGO the default on PS4. We also think that it would
be beneficial to make IRPGO the default PGO on all platforms (coverage
would continue to use FE instr as it does currently, of course). In
previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has
come up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

I'd like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms. I really don't like fragmenting things
like this (e.g. if a third-party tests "clang's" PGO they will get
something different depending on the platform), but I don't see another way
given Apple's constraints.

- Pre-instrumentation passes

Pre-instrumentation optimization has been critical for reducing the
overhead of PGO for the PS4 games we tested (as expected). However, in our
measurements (and we are glad to provide more info) the main benefit was
inlining (also as expected). A simple pass of inlining at threshold 100
appeared to give all the benefits. Even inlining at threshold 0 gave almost
all the benefits. For example, the passes initially proposed in
⚙ D15828 [PGO] Passmanagerbuilder change that enable IR level PGO instrumentation did not improve over just inlining with
threshold 100.

(due to PR27299 we also need to add simplifycfg after inlining to clean
up, but this doesn't affect the instrumentation overhead in our
measurements)

Bottom line: for our use cases, inlining does all the work, but we're
not opposed to having more passes, which might be beneficial for non-game
workloads (which is most code).

- Warnings

We identified 3 classes of issues which manifest as spammy warnings
when applying profile data with IRPGO (these affect FEPGO also I believe,
but we looked in depth at IRPGO):

1. The main concerning one is that getPGOFuncName mangles the filename
into the counter name. This causes us to get
instrprof_error::unknown_function when the pgo-use build is done in a
different build directory from the training build (which is a reasonable
thing to support). In this situation, PGO data is useless for all `static`
functions (and as a byproduct results in a huge volume of warnings).

Rong, I was just looking at implementing a fix for this, but noticed
something. Can we get rid of the "InLTO" argument to getPGOFuncName if we
unconditionally apply the funcname metadata to all functions?

Are you proposing always emitting the meta data for internal functions?

Yes; actually for all functions. If I understand correctly, the InLTO
argument fixes a specific case of a general problem: the PGOFuncName is
currently constructed via an algorithm that depends on the current state of
the module and the result can therefore change as the module is transformed.
To solve this, the idea is to only run the algorithm at a single point of
truth (the module as seen by prof-gen/prof-use) and freeze the result in
metadata. All other accesses to PGOFuncName simply read that metadata, so
there is no possibility for getting out of date.

Currently, it seems the meaning of the InLTO flag is basically "we are
accessing the PGOFuncName at a point that is not prof-gen/prof-use, so use
the metadata". I just want to make that clearer.

This problem is kind of special to value profiling in LTO mode. In this
mode, the transformation needs to be delayed (when all function bodies are
available) so we end up having compute the pgonames at two different points
with different program states. Your suggestion of doing this
unconditionally will work but will require more memory at compile time.

David

mehdi_amini · May 26, 2016, 12:20am

To provide more details: specifically the transformations that require the metadata are the one that lead to changing the “PGOFuncName”, so it is specifically 1) turning a symbol in to internal/private, and 2) straightforward renaming (which happens when you LTO-link two files that both have a local symbol with the same name).
Right now I believe the metadata is added specifically at the right place in the pipeline and only to cover the minimal amount of symbols that need to have the metadata (as David wrote, to avoid using extra memory).

Frederic_Riss · June 1, 2016, 1:02am

Jake and I have been integrating IRPGO on PS4, and we’ve identified 3 remaining work items.

Sean, thanks for the write up. It matches very well with what we think as well.

1

Driver changes

We’d like to make IRPGO the default on PS4. We also think that it would be beneficial to make IRPGO the default PGO on all platforms (coverage would continue to use FE instr as it does currently, of course). In previous conversations (e.g. http://reviews.llvm.org/D15829) it has come up that Apple have requirements that would prevent them from moving to IRPGO as the default PGO, at least without a deprecation period of one or two releases.

Sean pointed out the problematic scenario in D15829 (in plan “C”):
All existing user workflows continue to work, except for workflows that attempt to llvm-profdata merge some old frontend profile data (e.g. they have checked-in to version control and represents some special workload) with the profile data from new binaries.
We can address this issue by (1) making sure llvm-profdata emits a helpful warning when merging an FE-based profile with an IR-based one, and (2) keeping an option to use FE instrumentation for PGO. Having (2) helps people who can’t (or don’t want) to switch to IR PGO.

I’d like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all platforms except Apple platforms.

I’d really rather not introduce this inconsistency. I’m worried that it might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC’d) is following up with some of our internal users to check if we can change the default behavior of -fprofile-instr-generate. He should be able to chime in on this soon.

Sorry it took me so long. I’ve discussed the change in behavior quiet extensively, and I after having changed my mind a couple times, I would argue in favor of keeping the current behavior for the existing flags. I think adding a new switch for IRPGO is a better option. The argument that weighted most on my opinion is the proposed interaction with -fcoverage-mapping, and it is not at all platform specific. With the proposed new behavior, turning coverage on and off in your build system will generate a binary with different performance characteristics and this feels really wrong. I would actually make the IRPGO mode completely incompatible with the -fcoverage-mapping flag.

Fred

davidxl · June 1, 2016, 7:02pm

>
> Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.
>
> Sean, thanks for the write up. It matches very well with what we think
as well.

+ 1

> - Driver changes
>
> We'd like to make IRPGO the default on PS4. We also think that it would
be beneficial to make IRPGO the default PGO on all platforms (coverage
would continue to use FE instr as it does currently, of course). In
previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has come
up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

Sean pointed out the problematic scenario in D15829 (in plan "C"):
All existing user workflows continue to work, except for workflows that
attempt to llvm-profdata merge some old frontend profile data (e.g. they
have checked-in to version control and represents some special workload)
with the profile data from new binaries.
We can address this issue by (1) making sure llvm-profdata emits a
helpful warning when merging an FE-based profile with an IR-based one, and
(2) keeping an option to use FE instrumentation for PGO. Having (2) helps
people who can't (or don't want) to switch to IR PGO.

> I'd like to get consensus on a path forward.
> As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms.

I'd really rather not introduce this inconsistency. I'm worried that it
might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC'd) is following up with some of our internal users to check if
we can change the default behavior of -fprofile-instr-generate. He should
be able to chime in on this soon.

Fred,

Sorry it took me so long. I’ve discussed the change in behavior quiet
extensively, and I after having changed my mind a couple times, I would
argue in favor of keeping the current behavior for the existing flags. I
think adding a new switch for IRPGO is a better option. The argument that
weighted most on my opinion is the proposed interaction with
-fcoverage-mapping, and it is not at all platform specific. With the
proposed new behavior, turning coverage on and off in your build system
will generate a binary with different performance characteristics and this
feels really wrong.

This is an interesting observation, but IMO this should not be a problem in
practice:

- Coverage testing and PGO users are not overlapping. They have completely
different objectives and expectations. For instance, coverage users care
about covering the cold/rarely executed paths and find ways to make them
appear in the path, and in the meantime reduce the 'hotness' of real hot
paths in order to reduce testing overhead, while PGO users will do the
opposite.
- Coverage testing and PGO are two different things. Using PGO
infrastructure for coverage is actually an implementation detail. This is
why it is better to let -fcoverage-mapping to turn on FE instrumentation
automatically without needing the user to know about this
dependency/detail. There is also an ASAN based coverage support. It would
be nice to unify various ways of doing coverage testing with unified use
model, reporting tools and interfaces etc.
- Longer term, we will add more advanced features in IR based PGO, so
unless we duplicate all the work also in FE based instrumentation, the
longer term picture is that IR based instrumentation will become the
default choice for PGO users, so it seems natural to simplify their use
models by making IR based instrumentation the default. On the other hand,
the proposed flip won't create additional complexity to coverage testing
users.

I would actually make the IRPGO mode completely incompatible with the
-fcoverage-mapping flag.

yes, agree.

thanks,

David

_sean_silva · June 1, 2016, 8:46pm

>
> Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.
>
> Sean, thanks for the write up. It matches very well with what we think
as well.

+ 1

> - Driver changes
>
> We'd like to make IRPGO the default on PS4. We also think that it would
be beneficial to make IRPGO the default PGO on all platforms (coverage
would continue to use FE instr as it does currently, of course). In
previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has come
up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

Sean pointed out the problematic scenario in D15829 (in plan "C"):
All existing user workflows continue to work, except for workflows that
attempt to llvm-profdata merge some old frontend profile data (e.g. they
have checked-in to version control and represents some special workload)
with the profile data from new binaries.
We can address this issue by (1) making sure llvm-profdata emits a
helpful warning when merging an FE-based profile with an IR-based one, and
(2) keeping an option to use FE instrumentation for PGO. Having (2) helps
people who can't (or don't want) to switch to IR PGO.

> I'd like to get consensus on a path forward.
> As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms.

I'd really rather not introduce this inconsistency. I'm worried that it
might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC'd) is following up with some of our internal users to check if
we can change the default behavior of -fprofile-instr-generate. He should
be able to chime in on this soon.
Sorry it took me so long.

Hi Fred,

My understanding is that you were specifically investigating whether Apple
needed compatibility for merging indexed profiles. Is that compatibility
needed? The only compelling argument I have heard to continue to expose
FEPGO is that Apple may have a compatibility requirement for merging
indexed profiles from previous compiler versions.

Even if this is a requirement, then I still intend to make IRPGO the
default and only PGO going forward, at least on PS4. I think that doing the
same for all platforms in the upstream compiler probably makes sense as
well, since an internal Apple vendor compatibility requirement should not
penalize all users of the open source project.

I’ve discussed the change in behavior quiet extensively, and I after
having changed my mind a couple times, I would argue in favor of keeping
the current behavior for the existing flags. I think adding a new switch
for IRPGO is a better option. The argument that weighted most on my opinion
is the proposed interaction with -fcoverage-mapping, and it is not at all
platform specific. With the proposed new behavior, turning coverage on and
off in your build system will generate a binary with different performance
characteristics and this feels really wrong.

Bob already mentioned in the other thread that `-fprofile-instr-generate
-fcoverage-mapping` was sufficiently different from
`-fprofile-instr-generate` that `-fprofile-instr-generate
-fcoverage-mapping` was not an acceptable workaround that could be used for
enabling FEPGO during a transitionary period, so I'm not convinced that
your argument here makes sense.

I also share David's opinion that this is not going to be an issue in
practice. I think it makes sense for PGO and coverage to have different
overheads. Coverage inherently has to trace all locations at source level,
while PGO has more freedom.

Also, David's point about redundant work on FEPGO is a good one. We don't
want to continue maintaining two different PGO's.

I would actually make the IRPGO mode completely incompatible with the
-fcoverage-mapping flag.

I'm not sure what you mean by this. Nobody is proposing anything that would
make -fcoverage-mapping do anything related to IRPGO.

-- Sean Silva

Frederic_Riss · June 2, 2016, 12:46am

Jake and I have been integrating IRPGO on PS4, and we’ve identified 3 remaining work items.

Sean, thanks for the write up. It matches very well with what we think as well.

1

Driver changes

We’d like to make IRPGO the default on PS4. We also think that it would be beneficial to make IRPGO the default PGO on all platforms (coverage would continue to use FE instr as it does currently, of course). In previous conversations (e.g. http://reviews.llvm.org/D15829) it has come up that Apple have requirements that would prevent them from moving to IRPGO as the default PGO, at least without a deprecation period of one or two releases.

Sean pointed out the problematic scenario in D15829 (in plan “C”):
All existing user workflows continue to work, except for workflows that attempt to llvm-profdata merge some old frontend profile data (e.g. they have checked-in to version control and represents some special workload) with the profile data from new binaries.
We can address this issue by (1) making sure llvm-profdata emits a helpful warning when merging an FE-based profile with an IR-based one, and (2) keeping an option to use FE instrumentation for PGO. Having (2) helps people who can’t (or don’t want) to switch to IR PGO.

I’d like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all platforms except Apple platforms.

I’d really rather not introduce this inconsistency. I’m worried that it might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC’d) is following up with some of our internal users to check if we can change the default behavior of -fprofile-instr-generate. He should be able to chime in on this soon.
Sorry it took me so long.
Hi Fred,

My understanding is that you were specifically investigating whether Apple needed compatibility for merging indexed profiles. Is that compatibility needed? The only compelling argument I have heard to continue to expose FEPGO is that Apple may have a compatibility requirement for merging indexed profiles from previous compiler versions.

Sorry no, my comment had nothing to do with merging profiles. I understand that this will break, and it might very well be an issue for us, but I think there is a more fundamental issue with the proposed plan. As you bring it up though, this is a user visible breakage that shouldn’t be disregarded completely.

Even if this is a requirement, then I still intend to make IRPGO the default and only PGO going forward, at least on PS4. I think that doing the same for all platforms in the upstream compiler probably makes sense as well, since an internal Apple vendor compatibility requirement should not penalize all users of the open source project.

Again, I’m not expressing an Apple requirement, just trying to discuss the specifics of the proposed implementation. My goal is not to hinder anything, and I want our platforms to be able to use IRPGO reliably if users see the need for it.

I’ve discussed the change in behavior quiet extensively, and I after having changed my mind a couple times, I would argue in favor of keeping the current behavior for the existing flags. I think adding a new switch for IRPGO is a better option. The argument that weighted most on my opinion is the proposed interaction with -fcoverage-mapping, and it is not at all platform specific. With the proposed new behavior, turning coverage on and off in your build system will generate a binary with different performance characteristics and this feels really wrong.

Bob already mentioned in the other thread that -fprofile-instr-generate -fcoverage-mapping was sufficiently different from -fprofile-instr-generate that -fprofile-instr-generate -fcoverage-mapping was not an acceptable workaround that could be used for enabling FEPGO during a transitionary period, so I’m not convinced that your argument here makes sense.

I’m not sure what you’re referring to here, and I have a hard time parsing the sentence. I suppose “was not an acceptable” should read “was an acceptable”? I would be surprised that Bob ever agreed to completely transition away from FEPGO. I didn’t even understand that getting rid of FEPGO was on table as you seem to imply bellow.

I also share David’s opinion that this is not going to be an issue in practice. I think it makes sense for PGO and coverage to have different overheads. Coverage inherently has to trace all locations at source level, while PGO has more freedom.

I’m sorry if I wasn’t clear, but I’m not talking about instrumentation overhead, I’m talking about the performance of the binary generated using the profiles. If we go the route of making the meaning of -fprofile-instr-generate depend on whether -fcoverage-mapping gets passed, then we change the kind of instrumentation and thus the input to the optimizations behind the user’s back. I wouldn’t be surprised that using profiles generated by FEPGO and IRPGO give you a final executable with measurably different performance characteristics. If you’re tracking your performance, this can be really painful. Recently we wasted days investigating performance regressions that were due to buggy profiles. I strongly believe having an option seemingly unrelated to PGO change this behavior is wrong and can cause actual pain for our end users.

Also, David’s point about redundant work on FEPGO is a good one. We don’t want to continue maintaining two different PGO’s.

Are you implying that LLVM should drop FEPGO? It’s a totally sensible thing to do to use your tests as training data for your profile generation. It’s also a very valid thing to do to use your tests to do coverage. Xcode does both of these things. I would see it a a big regression to not support doing both at the same time (this would mean doubling compile+testing time for users of both).
As the instrumentation needs to stay there for coverage anyway, I expect FEPGO to stay there and maintained (we care a lot about coverage). I’m not saying that all the work going into IRPGO should be duplicated in FEPGO, but what’s there and working should keep working.
Also, FEPGO has a lot of nice characteristics like resilience to IRGEN changes. If you have archived profiles, then when you switch compilers your performance shouldn’t degrade with FEPGO (modulo optimization bugs), while it’s much more likely to degrade with IRPGO. It overall looks like a much better option for people who do not need the lower instrumentation overhead.

I would actually make the IRPGO mode completely incompatible with the -fcoverage-mapping flag.

I’m not sure what you mean by this. Nobody is proposing anything that would make -fcoverage-mapping do anything related to IRPGO.

What I mean is that -f should error out when passed at the same time as -fcoverage-mapping.

Thanks!
Fred

_sean_silva · June 2, 2016, 7:10am

>
> Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.
>
> Sean, thanks for the write up. It matches very well with what we think
as well.

+ 1

> - Driver changes
>
> We'd like to make IRPGO the default on PS4. We also think that it
would be beneficial to make IRPGO the default PGO on all platforms
(coverage would continue to use FE instr as it does currently, of course).
In previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has
come up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

Sean pointed out the problematic scenario in D15829 (in plan "C"):
All existing user workflows continue to work, except for workflows that
attempt to llvm-profdata merge some old frontend profile data (e.g. they
have checked-in to version control and represents some special workload)
with the profile data from new binaries.
We can address this issue by (1) making sure llvm-profdata emits a
helpful warning when merging an FE-based profile with an IR-based one, and
(2) keeping an option to use FE instrumentation for PGO. Having (2) helps
people who can't (or don't want) to switch to IR PGO.

> I'd like to get consensus on a path forward.
> As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms.

I'd really rather not introduce this inconsistency. I'm worried that it
might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC'd) is following up with some of our internal users to check if
we can change the default behavior of -fprofile-instr-generate. He should
be able to chime in on this soon.
Sorry it took me so long.
Hi Fred,

My understanding is that you were specifically investigating whether Apple
needed compatibility for merging indexed profiles. Is that compatibility
needed? The only compelling argument I have heard to continue to expose
FEPGO is that Apple may have a compatibility requirement for merging
indexed profiles from previous compiler versions.

Sorry no, my comment had nothing to do with merging profiles. I understand
that this will break, and it might very well be an issue for us, but I
think there is a more fundamental issue with the proposed plan. As you
bring it up though, this is a user visible breakage that shouldn’t be
disregarded completely.

Merging with existing indexed profiles is the only user-visible breakage
AFAIK (this was discussed at length in ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation and
the corresponding email thread). Please provide concrete examples where
things would break.

Even if this is a requirement, then I still intend to make IRPGO the
default and only PGO going forward, at least on PS4. I think that doing the
same for all platforms in the upstream compiler probably makes sense as
well, since an internal Apple vendor compatibility requirement should not
penalize all users of the open source project.

Again, I’m not expressing an Apple requirement, just trying to discuss the
specifics of the proposed implementation. My goal is not to hinder
anything, and I want our platforms to be able to use IRPGO reliably if
users see the need for it.

What I'm saying is that besides reduced training overhead (and the
inability to merge with older indexed profiles, which AFAIK is the only
actual potential requirement that would need a deprecation period for
FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one
(except as something purely during a deprecation period) is
pointless. "just a better PGO" is what IRPGO is for my users. I don't want
to have to have them deal with (and I don't want to support) FEPGO.

Anything that will cause the existing flag to continue to produce FEPGO on
PS4 is not something that I'm really okay with. The reduced overhead of
IRPGO is really important on PS4 (i.e. the difference between the
instrumented game being playable or not). I really don't want to have to
test the triple to determine the meaning of `-fprofile-instr-generate`
(without `-fcoverage-mapping`).

I’ve discussed the change in behavior quiet extensively, and I after

having changed my mind a couple times, I would argue in favor of keeping
the current behavior for the existing flags. I think adding a new switch
for IRPGO is a better option. The argument that weighted most on my opinion
is the proposed interaction with -fcoverage-mapping, and it is not at all
platform specific. With the proposed new behavior, turning coverage on and
off in your build system will generate a binary with different performance
characteristics and this feels really wrong.

Bob already mentioned in the other thread that `-fprofile-instr-generate
-fcoverage-mapping` was sufficiently different from
`-fprofile-instr-generate` that `-fprofile-instr-generate
-fcoverage-mapping` was not an acceptable workaround that could be used for
enabling FEPGO during a transitionary period, so I'm not convinced that
your argument here makes sense.

I’m not sure what you’re referring to here, and I have a hard time parsing
the sentence. I suppose “was not an acceptable” should read “was an
acceptable”? I would be surprised that Bob ever agreed to completely
transition away from FEPGO. I didn’t even understand that getting rid of
FEPGO was on table as you seem to imply bellow.

No, it is written as intended. The backstory is in
⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation (and the corresponding email thread). The
paragraph starting with "The coverage mapping adds considerable cost.".

I also share David's opinion that this is not going to be an issue in
practice. I think it makes sense for PGO and coverage to have different
overheads. Coverage inherently has to trace all locations at source level,
while PGO has more freedom.

I’m sorry if I wasn’t clear, but I’m not talking about instrumentation
overhead, I’m talking about the performance of the binary generated using
the profiles. If we go the route of making the meaning of
-fprofile-instr-generate depend on whether -fcoverage-mapping gets passed,
then we change the kind of instrumentation and thus the input to the
optimizations behind the user’s back. I wouldn’t be surprised that using
profiles generated by FEPGO and IRPGO give you a final executable with
measurably different performance characteristics.

I think the point is that given the effort being put into IRPGO, the IRPGO
version will always be a faster final executable. Why provide a "worse" PGO
option?

If you’re tracking your performance, this can be really painful. Recently
we wasted days investigating performance regressions that were due to buggy
profiles. I strongly believe having an option seemingly unrelated to PGO
change this behavior is wrong and can cause actual pain for our end users.

After a deprecation period we can force `-fprofile-instr-generate` and
`-fcoverage-mapping` to be mutually exclusive if necessary. Does this solve
your problem?

Actually, I think it makes a lot of sense in some respects for
`-fprofile-instr-generate` and `-fprofile-instr-generate
-fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a
user's perspective is basically "is the instrumentation inserted by the
compiler constrained to have source-level coverage, or does the compiler
not have this restriction". Although as I've said, I'm not a fan of
supporting FEPGO in the long-term due to maintenance issues.

Also note that things like the context-sensitivity obtained through
pre-inlining (see Rong's original RFC) is simply not obtainable within a
source-level instrumentation paradigm (even if we did something like the
counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant
profiling instrumentation" to reduce the overhead to that of IRPGO with
pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be
at an inherent disadvantage.

Also, David's point about redundant work on FEPGO is a good one. We don't
want to continue maintaining two different PGO’s.

Are you implying that LLVM should drop FEPGO? It’s a totally sensible
thing to do to use your tests as training data for your profile generation.
It’s also a very valid thing to do to use your tests to do coverage. Xcode
does both of these things. I would see it a a big regression to not support
doing both at the same time (this would mean doubling compile+testing time
for users of both).

As David pointed out, training runs for PGO and coverage have different
goals. I'm very skeptical of any testing that tries to do both at the same
time, but this will continue to work (albeit without benefitting from any
of the effort being put into IRPGO).

As the instrumentation needs to stay there for coverage anyway, I expect
FEPGO to stay there and maintained (we care a lot about coverage). I’m not
saying that all the work going into IRPGO should be duplicated in FEPGO,
but what’s there and working should keep working.

For my users the reduced overhead of IRPGO is an important feature, and
making it the default is important for that reason. Since most of the
effort going into PGO is focused on IRPGO, this will lead to users using
FEPGO ending up as a second-tier PGO, which Vedant said he specifically
wanted to avoid. The only option to avoid this is for users to not be using
FEPGO.

Also, FEPGO has a lot of nice characteristics like resilience to IRGEN
changes. If you have archived profiles, then when you switch compilers your
performance shouldn’t degrade with FEPGO (modulo optimization bugs), while
it’s much more likely to degrade with IRPGO.

Note that this use case continues to work. I.e. we continue to apply
existing frontend profiles correctly. including frontend profiles generated
with -fcoverage-mapping, so that collecting coverage and PGO at the same
time (although not advisable) still works. The only use case that breaks is
merging existing indexed profiles, which is why we are specifically waiting
for an answer on whether this is a requirement for you guys at Apple, which
will determine what kind of deprecation period etc. will be needed before
we can default it.

It overall looks like a much better option for people who do not need the
lower instrumentation overhead.

This is not just about lower instrumentation overhead. Things like the
recently added static VP node allocation (which will e.g. make indirect
callsite promotion for LTO'd kernels work) are other things are being
missed out on.

I would actually make the IRPGO mode completely incompatible with the

-fcoverage-mapping flag.

I'm not sure what you mean by this. Nobody is proposing anything that
would make -fcoverage-mapping do anything related to IRPGO.

What I mean is that -f<whatever enables IRPGO> should error out when
passed at the same time as -fcoverage-mapping.

I think you're coming into this with the mindset that FEPGO will still be a
possibility (outside of a build that is used for coverage mapping). I'm not
convinced that we actually need to continue exposing that except as a weird
thing in conjunction with coverage (and possibly for a deprecation period
if users want to merge indexed profiles).

-- Sean Silva

Topic		Replies	Views
Status of IR vs. frontend PGO (fprofile-generate vs fprofile-instr-generate) Clang Frontend	13	891	June 15, 2021
Proposal: add instrumentation for PGO and code coverage Clang Frontend	11	629	September 10, 2013
Profile-Guided Optimization (PGO) related questions and suggestions LLVM Project pgo	24	1890	December 20, 2023
RFC: PGO Late instrumentation for LLVM LLVM Dev List Archives	6	164	August 11, 2015
RFC: PGO Late instrumentation for LLVM LLVM Dev List Archives	18	269	November 5, 2015

The state of IRPGO (3 remaining work items)

Related topics