The state of IRPGO (3 remaining work items)

>
> Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.
>
> Sean, thanks for the write up. It matches very well with what we think
as well.

+ 1

> - Driver changes
>
> We'd like to make IRPGO the default on PS4. We also think that it
would be beneficial to make IRPGO the default PGO on all platforms
(coverage would continue to use FE instr as it does currently, of course).
In previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has
come up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

Sean pointed out the problematic scenario in D15829 (in plan "C"):

All existing user workflows continue to work, except for workflows that
attempt to llvm-profdata merge some old frontend profile data (e.g. they
have checked-in to version control and represents some special workload)
with the profile data from new binaries.

We can address this issue by (1) making sure llvm-profdata emits a
helpful warning when merging an FE-based profile with an IR-based one, and
(2) keeping an option to use FE instrumentation for PGO. Having (2) helps
people who can't (or don't want) to switch to IR PGO.

> I'd like to get consensus on a path forward.
> As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms.

I'd really rather not introduce this inconsistency. I'm worried that it
might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC'd) is following up with some of our internal users to check if
we can change the default behavior of -fprofile-instr-generate. He should
be able to chime in on this soon.

Fred,

Sorry it took me so long. I’ve discussed the change in behavior quiet
extensively, and I after having changed my mind a couple times, I would
argue in favor of keeping the current behavior for the existing flags. I
think adding a new switch for IRPGO is a better option. The argument that
weighted most on my opinion is the proposed interaction with
-fcoverage-mapping, and it is not at all platform specific. With the
proposed new behavior, turning coverage on and off in your build system
will generate a binary with different performance characteristics and this
feels really wrong.

This is an interesting observation, but IMO this should not be a problem
in practice:

- Coverage testing and PGO users are not overlapping. They have completely
different objectives and expectations. For instance, coverage users care
about covering the cold/rarely executed paths and find ways to make them
appear in the path, and in the meantime reduce the 'hotness' of real hot
paths in order to reduce testing overhead, while PGO users will do the
opposite.
- Coverage testing and PGO are two different things. Using PGO
infrastructure for coverage is actually an implementation detail. This is
why it is better to let -fcoverage-mapping to turn on FE instrumentation
automatically without needing the user to know about this dependency/detail.

This is a very good point and I think is the right long-term direction. It
was a mistake to make this implementation detail user-visible.

-- Sean Silva

It sounds to me we are likely to converge on the following:

1) Making IR/llvm based PGO the default;
2) Enhance -fcoverage-mapping such that it automatically turns on FE based
instrumentation
3) if -fcoverage-mapping is used together with -fprofile-instr-generate,
-fcoverage-mapping serves as a switch to turn on FE based instrumetnation

All the above are transparent to users.

The following are for advanced usage:

4) have a new option to explicitly switch instrumentation flavor to be FE
based
5) have a new option to turn off part of pre-instrumentation
cleanup/simplification passes for users who want very stable profile for
stable library sources *

* 4 and 5 serves the same purpose so 5 may not be necessary.

One question for your David, related to merging indexed profiles. The
requirement that indexed profiles from different compiler versions can be
merged (with the intended result) implies that the CFG hash for a given
function must be the same (this is the change that actually breaks merging
IRPGO vs FEPGO). But the same issue in principle applies to IRPGO across
compiler versions. How likely do you think it will be that this hash may
change? I can think of two sources:
a) we decide to walk the CFG differently for computing the hash or decide
to use a different hash function etc.
b) pre-instrumentation passes change across compiler versions (e.g. inliner
makes different decisions in a newer compiler version)

`b)` definitely seems like it will be a problem. What do you think about a)?

-- Sean Silva

It sounds to me we are likely to converge on the following:

1) Making IR/llvm based PGO the default;
2) Enhance -fcoverage-mapping such that it automatically turns on FE
based instrumentation
3) if -fcoverage-mapping is used together with -fprofile-instr-generate,
-fcoverage-mapping serves as a switch to turn on FE based instrumetnation

All the above are transparent to users.

The following are for advanced usage:

4) have a new option to explicitly switch instrumentation flavor to be FE
based
5) have a new option to turn off part of pre-instrumentation
cleanup/simplification passes for users who want very stable profile for
stable library sources *

* 4 and 5 serves the same purpose so 5 may not be necessary.

One question for your David, related to merging indexed profiles. The
requirement that indexed profiles from different compiler versions can be
merged (with the intended result) implies that the CFG hash for a given
function must be the same (this is the change that actually breaks merging
IRPGO vs FEPGO). But the same issue in principle applies to IRPGO across
compiler versions. How likely do you think it will be that this hash may
change? I can think of two sources:
a) we decide to walk the CFG differently for computing the hash or decide
to use a different hash function etc.
b) pre-instrumentation passes change across compiler versions (e.g.
inliner makes different decisions in a newer compiler version)

`b)` definitely seems like it will be a problem. What do you think about
a)?

Strictly speaking, the requirement is just that index format change is
backward compatible -- this includes changes of PGO name encoding scheme.
For content function hashing, unless we systematically change the hash
algorithm which will require a version bump, local compiler changes (such
as bug fixes) that lead to hashing to be different for individual functions
should not be considered breaking compatibility. This can happen for both
IR and FE instrumentation.

The above should apply to b) as well. It is true that b) is more sensitive
to compiler version changes in a way that more individual functions'
profile may get invalidated due to version change, but I think this is by
design -- keeping profile fully compatible across compiler versions is not
the major intended use cases. However if cfg traverse order changes, I
think a version bump is needed. This however should be fairly rarely.

David

Jake and I have been integrating IRPGO on PS4, and we’ve identified 3 remaining work items.

Sean, thanks for the write up. It matches very well with what we think as well.

  • 1
  • Driver changes

We’d like to make IRPGO the default on PS4. We also think that it would be beneficial to make IRPGO the default PGO on all platforms (coverage would continue to use FE instr as it does currently, of course). In previous conversations (e.g. http://reviews.llvm.org/D15829) it has come up that Apple have requirements that would prevent them from moving to IRPGO as the default PGO, at least without a deprecation period of one or two releases.

Sean pointed out the problematic scenario in D15829 (in plan “C”):

All existing user workflows continue to work, except for workflows that attempt to llvm-profdata merge some old frontend profile data (e.g. they have checked-in to version control and represents some special workload) with the profile data from new binaries.

We can address this issue by (1) making sure llvm-profdata emits a helpful warning when merging an FE-based profile with an IR-based one, and (2) keeping an option to use FE instrumentation for PGO. Having (2) helps people who can’t (or don’t want) to switch to IR PGO.

I’d like to get consensus on a path forward.
As a point of discussion, how about we make IRPGO the default on all platforms except Apple platforms.

I’d really rather not introduce this inconsistency. I’m worried that it might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC’d) is following up with some of our internal users to check if we can change the default behavior of -fprofile-instr-generate. He should be able to chime in on this soon.

Sorry it took me so long.

Hi Fred,

My understanding is that you were specifically investigating whether Apple needed compatibility for merging indexed profiles. Is that compatibility needed? The only compelling argument I have heard to continue to expose FEPGO is that Apple may have a compatibility requirement for merging indexed profiles from previous compiler versions.

Sorry no, my comment had nothing to do with merging profiles. I understand that this will break, and it might very well be an issue for us, but I think there is a more fundamental issue with the proposed plan. As you bring it up though, this is a user visible breakage that shouldn’t be disregarded completely.

Merging with existing indexed profiles is the only user-visible breakage AFAIK (this was discussed at length in http://reviews.llvm.org/D15829 and the corresponding email thread). Please provide concrete examples where things would break.

I feel like we are talking past each other. It is a fact that “merging existing indexed profiles” will break. You consider it to be a reasonable breakage, I’m just saying that we shouldn’t ignore this completely. It is a known breakage so it should be considered when making the decision about the new flags’ behavior.
We advise users to commit their profiles to VCS, so this could actually be a real issue for them (Xcode won’t try to merge old/new profiles by itself though, but users could do this). For those kind of changes where the alternative is not a full drop-in replacement, I think deprecating the option and replacing it with a new one is a better methodology.

Even if this is a requirement, then I still intend to make IRPGO the default and only PGO going forward, at least on PS4. I think that doing the same for all platforms in the upstream compiler probably makes sense as well, since an internal Apple vendor compatibility requirement should not penalize all users of the open source project.

Again, I’m not expressing an Apple requirement, just trying to discuss the specifics of the proposed implementation. My goal is not to hinder anything, and I want our platforms to be able to use IRPGO reliably if users see the need for it.

What I’m saying is that besides reduced training overhead (and the inability to merge with older indexed profiles, which AFAIK is the only actual potential requirement that would need a deprecation period for FEPGO), IRPGO is basically “just a better PGO”, so adding a frontend one (except as something purely during a deprecation period) is pointless. “just a better PGO” is what IRPGO is for my users. I don’t want to have to have them deal with (and I don’t want to support) FEPGO.

Well we want to support FEPGO because it fits Xcode’s workflow better (mainly it allows to do PGO + coverage at the same time).

Anything that will cause the existing flag to continue to produce FEPGO on PS4 is not something that I’m really okay with. The reduced overhead of IRPGO is really important on PS4 (i.e. the difference between the instrumented game being playable or not). I really don’t want to have to test the triple to determine the meaning of -fprofile-instr-generate (without -fcoverage-mapping).

And I agree that IRPGO is better for your users, but the best workflow for your users is not necessarily the best workflow for every clang user. I totally agree with you that testing the triple to decide the meaning of the option is wrong to do upstream. The best default is not a platform choice, it’s a workflow decision.

I’ve discussed the change in behavior quiet extensively, and I after having changed my mind a couple times, I would argue in favor of keeping the current behavior for the existing flags. I think adding a new switch for IRPGO is a better option. The argument that weighted most on my opinion is the proposed interaction with -fcoverage-mapping, and it is not at all platform specific. With the proposed new behavior, turning coverage on and off in your build system will generate a binary with different performance characteristics and this feels really wrong.

Bob already mentioned in the other thread that -fprofile-instr-generate -fcoverage-mapping was sufficiently different from -fprofile-instr-generate that -fprofile-instr-generate -fcoverage-mapping was not an acceptable workaround that could be used for enabling FEPGO during a transitionary period, so I’m not convinced that your argument here makes sense.

I’m not sure what you’re referring to here, and I have a hard time parsing the sentence. I suppose “was not an acceptable” should read “was an acceptable”? I would be surprised that Bob ever agreed to completely transition away from FEPGO. I didn’t even understand that getting rid of FEPGO was on table as you seem to imply bellow.

No, it is written as intended. The backstory is in http://reviews.llvm.org/D15829 (and the corresponding email thread). The paragraph starting with "The coverage mapping adds considerable cost.”.

Thanks for the pointer, it turns out this part of the thread never made it to my mailbox. What Bob is saying is that we need an option to turn on FEPGO without coverage, and this is basically exactly what I think we should do: add an explicit option to choose the instrumentation type, so that we can use FEPGO even without coverage in the new IRPGO world.

I also share David’s opinion that this is not going to be an issue in practice. I think it makes sense for PGO and coverage to have different overheads. Coverage inherently has to trace all locations at source level, while PGO has more freedom.

I’m sorry if I wasn’t clear, but I’m not talking about instrumentation overhead, I’m talking about the performance of the binary generated using the profiles. If we go the route of making the meaning of -fprofile-instr-generate depend on whether -fcoverage-mapping gets passed, then we change the kind of instrumentation and thus the input to the optimizations behind the user’s back. I wouldn’t be surprised that using profiles generated by FEPGO and IRPGO give you a final executable with measurably different performance characteristics.

I think the point is that given the effort being put into IRPGO, the IRPGO version will always be a faster final executable.

2 things:

  • Do you have strong evidence of this being true in all cases? I’m not looking forward to when I’ll transition Apple’s clang to IRPGO, because I know that we’ll have to spend time characterizing the performance of the compiler. It might be faster in all but a couple benchmarks, we still will need to investigate what happened there to see if we can fix it. I’m not saying it’s bad (I’m actually also looking forward to evaluate it, because it might give better results for us), but it’s definitely not a benign change that I would do without double checking the results. And as such, I also wouldn’t want the compiler to silently do this change behind my back.
  • Even if it is 100% true, then users will see a degradation in performance if they add -fcoverage-mapping to their CFLAGS. I think this is just bad user experience. The user should be able to decide if he wants to be able to compromise some performance to be able to do PGO+coverage at the same time.

Why provide a “worse” PGO option?

Worse from your point of view. The resilience to compiler changes that FEPGO currently offers is a real feature even if it doesn’t have a place in Sony’s workflow.

If you’re tracking your performance, this can be really painful. Recently we wasted days investigating performance regressions that were due to buggy profiles. I strongly believe having an option seemingly unrelated to PGO change this behavior is wrong and can cause actual pain for our end users.

After a deprecation period we can force -fprofile-instr-generate and -fcoverage-mapping to be mutually exclusive if necessary. Does this solve your problem?

We don’t want a “deprecation period”. Unless IRPGO evolves in something that can do precise coverage and PGO in the same run, FEPGO can’t be deprecated (as in slated for removal).

Actually, I think it makes a lot of sense in some respects for -fprofile-instr-generate and -fprofile-instr-generate -fcoverage-mapping to be IRPGO and FEPGO/coverage. The difference from a user’s perspective is basically “is the instrumentation inserted by the compiler constrained to have source-level coverage, or does the compiler not have this restriction”. Although as I’ve said, I’m not a fan of supporting FEPGO in the long-term due to maintenance issues.

Also note that things like the context-sensitivity obtained through pre-inlining (see Rong’s original RFC) is simply not obtainable within a source-level instrumentation paradigm (even if we did something like the counter fusion discussed in “[llvm-dev] RFC: Pass to prune redundant profiling instrumentation” to reduce the overhead to that of IRPGO with pre-inlining). Thus FEPGO a.k.a. “coverage-level PGO” would nonetheless be at an inherent disadvantage.

Also, David’s point about redundant work on FEPGO is a good one. We don’t want to continue maintaining two different PGO’s.

Are you implying that LLVM should drop FEPGO? It’s a totally sensible thing to do to use your tests as training data for your profile generation. It’s also a very valid thing to do to use your tests to do coverage. Xcode does both of these things. I would see it a a big regression to not support doing both at the same time (this would mean doubling compile+testing time for users of both).

As David pointed out, training runs for PGO and coverage have different goals. I’m very skeptical of any testing that tries to do both at the same time, but this will continue to work (albeit without benefitting from any of the effort being put into IRPGO).

For a high-end game, sure. Small apps are a different beast and it’s a reasonable thing to use your set of tests as training data. Again, all the workflows are not equivalent.

As the instrumentation needs to stay there for coverage anyway, I expect FEPGO to stay there and maintained (we care a lot about coverage). I’m not saying that all the work going into IRPGO should be duplicated in FEPGO, but what’s there and working should keep working.

For my users the reduced overhead of IRPGO is an important feature, and making it the default is important for that reason. Since most of the effort going into PGO is focused on IRPGO, this will lead to users using FEPGO ending up as a second-tier PGO, which Vedant said he specifically wanted to avoid. The only option to avoid this is for users to not be using FEPGO.

I’d love to have everybody be able to enjoy the lower overhead, but if the answer to this is that they need to compile+train twice to get coverage and PGO, then the trade-off is wrong. I suppose you can agree that every user doesn’t have the same requirements as game developers. At least users should have the choice.

Also, FEPGO has a lot of nice characteristics like resilience to IRGEN changes. If you have archived profiles, then when you switch compilers your performance shouldn’t degrade with FEPGO (modulo optimization bugs), while it’s much more likely to degrade with IRPGO.

Note that this use case continues to work. I.e. we continue to apply existing frontend profiles correctly. including frontend profiles generated with -fcoverage-mapping, so that collecting coverage and PGO at the same time (although not advisable) still works. The only use case that breaks is merging existing indexed profiles, which is why we are specifically waiting for an answer on whether this is a requirement for you guys at Apple, which will determine what kind of deprecation period etc. will be needed before we can default it.

It overall looks like a much better option for people who do not need the lower instrumentation overhead.

This is not just about lower instrumentation overhead. Things like the recently added static VP node allocation (which will e.g. make indirect callsite promotion for LTO’d kernels work) are other things are being missed out on.

I would actually make the IRPGO mode completely incompatible with the -fcoverage-mapping flag.

I’m not sure what you mean by this. Nobody is proposing anything that would make -fcoverage-mapping do anything related to IRPGO.

What I mean is that -f should error out when passed at the same time as -fcoverage-mapping.

I think you’re coming into this with the mindset that FEPGO will still be a possibility (outside of a build that is used for coverage mapping). I’m not convinced that we actually need to continue exposing that except as a weird thing in conjunction with coverage (and possibly for a deprecation period if users want to merge indexed profiles).

FEPGO needs to be a possibility. And the instrumentation code needs to be there anyway for the coverage. Here’s a proposal:

  • Add -fpgo-instr=[llvm|clang] to choose which kind of PGO instrumentation you want
  • make -fpgo-instr=llvm incompatible with with -fcoverage-mapping
  • deprecate -fprofile-instr-generate, and remove it after a couple releases

(I really don’t care about the names)

I think it is a better way forward to have -fprofile-instr-generate keep it’s current meaning and to have downstream toolchain providers have an internal patch to make it an alias to the one they prefer. Just because changing the meaning of options is a bad thing. This also means that if the consensus is that -fprofile-instr-generate should really change its meaning to mean IRPGO, I’m open to having this internal patch on our side.

I also think a new option is cleaner (again I don’t care about the name), but if people feel that -fprofile-instr-generate=[llvm|clang] is easier, then this works for me too.

Fred

>
> Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.
>
> Sean, thanks for the write up. It matches very well with what we
think as well.

+ 1

> - Driver changes
>
> We'd like to make IRPGO the default on PS4. We also think that it
would be beneficial to make IRPGO the default PGO on all platforms
(coverage would continue to use FE instr as it does currently, of course).
In previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it has
come up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.

Sean pointed out the problematic scenario in D15829 (in plan "C"):

All existing user workflows continue to work, except for workflows that
attempt to llvm-profdata merge some old frontend profile data (e.g. they
have checked-in to version control and represents some special workload)
with the profile data from new binaries.

We can address this issue by (1) making sure llvm-profdata emits a
helpful warning when merging an FE-based profile with an IR-based one, and
(2) keeping an option to use FE instrumentation for PGO. Having (2) helps
people who can't (or don't want) to switch to IR PGO.

> I'd like to get consensus on a path forward.
> As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms.

I'd really rather not introduce this inconsistency. I'm worried that it
might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC'd) is following up with some of our internal users to check if
we can change the default behavior of -fprofile-instr-generate. He should
be able to chime in on this soon.

Sorry it took me so long.

Hi Fred,

My understanding is that you were specifically investigating whether
Apple needed compatibility for merging indexed profiles. Is that
compatibility needed? The only compelling argument I have heard to continue
to expose FEPGO is that Apple may have a compatibility requirement for
merging indexed profiles from previous compiler versions.

Sorry no, my comment had nothing to do with merging profiles. I
understand that this will break, and it might very well be an issue for us,
but I think there is a more fundamental issue with the proposed plan. As
you bring it up though, this is a user visible breakage that shouldn’t be
disregarded completely.

Merging with existing indexed profiles is the only user-visible breakage
AFAIK (this was discussed at length in ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation and
the corresponding email thread). Please provide concrete examples where
things would break.

I feel like we are talking past each other. It is a fact that “merging
existing indexed profiles” will break. You consider it to be a reasonable
breakage, I’m just saying that we shouldn't ignore this completely. It is a
known breakage so it should be considered when making the decision about
the new flags’ behavior.
We advise users to commit their profiles to VCS, so this could actually be
a real issue for them (Xcode won’t try to merge old/new profiles by itself
though, but users could do this). For those kind of changes where the
alternative is not a full drop-in replacement, I think deprecating the
option and replacing it with a new one is a better methodology.

Even if this is a requirement, then I still intend to make IRPGO the
default and only PGO going forward, at least on PS4. I think that doing the
same for all platforms in the upstream compiler probably makes sense as
well, since an internal Apple vendor compatibility requirement should not
penalize all users of the open source project.

Again, I’m not expressing an Apple requirement, just trying to discuss
the specifics of the proposed implementation. My goal is not to hinder
anything, and I want our platforms to be able to use IRPGO reliably if
users see the need for it.

What I'm saying is that besides reduced training overhead (and the
inability to merge with older indexed profiles, which AFAIK is the only
actual potential requirement that would need a deprecation period for
FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one
(except as something purely during a deprecation period) is
pointless. "just a better PGO" is what IRPGO is for my users. I don't want
to have to have them deal with (and I don't want to support) FEPGO.

Well we want to support FEPGO because it fits Xcode's workflow better
(mainly it allows to do PGO + coverage at the same time).

Anything that will cause the existing flag to continue to produce FEPGO on
PS4 is not something that I'm really okay with. The reduced overhead of
IRPGO is really important on PS4 (i.e. the difference between the
instrumented game being playable or not). I really don't want to have to
test the triple to determine the meaning of `-fprofile-instr-generate`
(without `-fcoverage-mapping`).

And I agree that IRPGO is better for your users, but the best workflow for
your users is not necessarily the best workflow for every clang user. I
totally agree with you that testing the triple to decide the meaning of the
option is wrong to do upstream. The best default is not a platform choice,
it’s a workflow decision.

I’ve discussed the change in behavior quiet extensively, and I after

having changed my mind a couple times, I would argue in favor of keeping
the current behavior for the existing flags. I think adding a new switch
for IRPGO is a better option. The argument that weighted most on my opinion
is the proposed interaction with -fcoverage-mapping, and it is not at all
platform specific. With the proposed new behavior, turning coverage on and
off in your build system will generate a binary with different performance
characteristics and this feels really wrong.

Bob already mentioned in the other thread that `-fprofile-instr-generate
-fcoverage-mapping` was sufficiently different from
`-fprofile-instr-generate` that `-fprofile-instr-generate
-fcoverage-mapping` was not an acceptable workaround that could be used for
enabling FEPGO during a transitionary period, so I'm not convinced that
your argument here makes sense.

I’m not sure what you’re referring to here, and I have a hard time
parsing the sentence. I suppose “was not an acceptable” should read “was an
acceptable”? I would be surprised that Bob ever agreed to completely
transition away from FEPGO. I didn’t even understand that getting rid of
FEPGO was on table as you seem to imply bellow.

No, it is written as intended. The backstory is in
⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation (and the corresponding email thread). The
paragraph starting with "The coverage mapping adds considerable cost.”.

Thanks for the pointer, it turns out this part of the thread never made it
to my mailbox. What Bob is saying is that we need an option to turn on
FEPGO without coverage, and this is basically exactly what I think we
should do: add an explicit option to choose the instrumentation type, so
that we can use FEPGO even without coverage in the new IRPGO world.

I also share David's opinion that this is not going to be an issue in
practice. I think it makes sense for PGO and coverage to have different
overheads. Coverage inherently has to trace all locations at source level,
while PGO has more freedom.

I’m sorry if I wasn’t clear, but I’m not talking about instrumentation
overhead, I’m talking about the performance of the binary generated using
the profiles. If we go the route of making the meaning of
-fprofile-instr-generate depend on whether -fcoverage-mapping gets passed,
then we change the kind of instrumentation and thus the input to the
optimizations behind the user’s back. I wouldn’t be surprised that using
profiles generated by FEPGO and IRPGO give you a final executable with
measurably different performance characteristics.

I think the point is that given the effort being put into IRPGO, the IRPGO
version will always be a faster final executable.

2 things:
- Do you have strong evidence of this being true in all cases? I’m not
looking forward to when I’ll transition Apple’s clang to IRPGO, because I
know that we’ll have to spend time characterizing the performance of the
compiler. It might be faster in all but a couple benchmarks, we still will
need to investigate what happened there to see if we can fix it. I’m not
saying it’s bad (I’m actually also looking forward to evaluate it, because
it might give better results for us), but it’s definitely not a benign
change that I would do without double checking the results. And as such, I
also wouldn’t want the compiler to silently do this change behind my back.
- Even if it is 100% true, then users will see a degradation in
performance if they add -fcoverage-mapping to their CFLAGS. I think this is
just bad user experience. The user should be able to decide if he wants to
be able to compromise some performance to be able to do PGO+coverage at the
same time.

Why provide a "worse" PGO option?

Worse from your point of view. The resilience to compiler changes that
FEPGO currently offers is a real feature even if it doesn’t have a place in
Sony’s workflow.

If you’re tracking your performance, this can be really painful. Recently
we wasted days investigating performance regressions that were due to buggy
profiles. I strongly believe having an option seemingly unrelated to PGO
change this behavior is wrong and can cause actual pain for our end users.

After a deprecation period we can force `-fprofile-instr-generate` and
`-fcoverage-mapping` to be mutually exclusive if necessary. Does this solve
your problem?

We don’t want a “deprecation period”. Unless IRPGO evolves in something
that can do precise coverage and PGO in the same run, FEPGO can’t be
deprecated (as in slated for removal).

Actually, I think it makes a lot of sense in some respects for
`-fprofile-instr-generate` and `-fprofile-instr-generate
-fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a
user's perspective is basically "is the instrumentation inserted by the
compiler constrained to have source-level coverage, or does the compiler
not have this restriction". Although as I've said, I'm not a fan of
supporting FEPGO in the long-term due to maintenance issues.

Also note that things like the context-sensitivity obtained through
pre-inlining (see Rong's original RFC) is simply not obtainable within a
source-level instrumentation paradigm (even if we did something like the
counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant
profiling instrumentation" to reduce the overhead to that of IRPGO with
pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be
at an inherent disadvantage.

Also, David's point about redundant work on FEPGO is a good one. We don't
want to continue maintaining two different PGO’s.

Are you implying that LLVM should drop FEPGO? It’s a totally sensible
thing to do to use your tests as training data for your profile generation.
It’s also a very valid thing to do to use your tests to do coverage. Xcode
does both of these things. I would see it a a big regression to not support
doing both at the same time (this would mean doubling compile+testing time
for users of both).

As David pointed out, training runs for PGO and coverage have different
goals. I'm very skeptical of any testing that tries to do both at the same
time, but this will continue to work (albeit without benefitting from any
of the effort being put into IRPGO).

For a high-end game, sure. Small apps are a different beast and it’s a
reasonable thing to use your set of tests as training data. Again, all the
workflows are not equivalent.

As the instrumentation needs to stay there for coverage anyway, I expect

FEPGO to stay there and maintained (we care a lot about coverage). I’m not
saying that all the work going into IRPGO should be duplicated in FEPGO,
but what’s there and working should keep working.

For my users the reduced overhead of IRPGO is an important feature, and
making it the default is important for that reason. Since most of the
effort going into PGO is focused on IRPGO, this will lead to users using
FEPGO ending up as a second-tier PGO, which Vedant said he specifically
wanted to avoid. The only option to avoid this is for users to not be using
FEPGO.

I’d love to have everybody be able to enjoy the lower overhead, but if the
answer to this is that they need to compile+train twice to get coverage and
PGO, then the trade-off is wrong. I suppose you can agree that every user
doesn’t have the same requirements as game developers. At least users
should have the choice.

Also, FEPGO has a lot of nice characteristics like resilience to IRGEN

changes. If you have archived profiles, then when you switch compilers your
performance shouldn’t degrade with FEPGO (modulo optimization bugs), while
it’s much more likely to degrade with IRPGO.

Note that this use case continues to work. I.e. we continue to apply
existing frontend profiles correctly. including frontend profiles generated
with -fcoverage-mapping, so that collecting coverage and PGO at the same
time (although not advisable) still works. The only use case that breaks is
merging existing indexed profiles, which is why we are specifically waiting
for an answer on whether this is a requirement for you guys at Apple, which
will determine what kind of deprecation period etc. will be needed before
we can default it.

It overall looks like a much better option for people who do not need the
lower instrumentation overhead.

This is not just about lower instrumentation overhead. Things like the
recently added static VP node allocation (which will e.g. make indirect
callsite promotion for LTO'd kernels work) are other things are being
missed out on.

I would actually make the IRPGO mode completely incompatible with the

-fcoverage-mapping flag.

I'm not sure what you mean by this. Nobody is proposing anything that
would make -fcoverage-mapping do anything related to IRPGO.

What I mean is that -f<whatever enables IRPGO> should error out when
passed at the same time as -fcoverage-mapping.

I think you're coming into this with the mindset that FEPGO will still be
a possibility (outside of a build that is used for coverage mapping). I'm
not convinced that we actually need to continue exposing that except as a
weird thing in conjunction with coverage (and possibly for a deprecation
period if users want to merge indexed profiles).

FEPGO needs to be a possibility. And the instrumentation code needs to be
there anyway for the coverage. Here’s a proposal:
- Add -fpgo-instr=[llvm|clang] to choose which kind of PGO
instrumentation you want
- make -fpgo-instr=llvm incompatible with with -fcoverage-mapping
- deprecate -fprofile-instr-generate, and remove it after a couple
releases

(I really don’t care about the names)

I think it is a better way forward to have -fprofile-instr-generate keep
it’s current meaning and to have downstream toolchain providers have an
internal patch to make it an alias to the one they prefer. Just because
changing the meaning of options is a bad thing.

Sure, I can deal with that. And actually this discussion has suggested some
good user-understandable names for the two different PGO's. I think that
"stable PGO" or "coverage-based PGO" or "source-level PGO" is a really good
name for FEPGO. "frontend" and "IR" aren't really meaningful to users.

Based on what I have seen, FEPGO has the following reasons to keep it
around:
- it currently can be used together with coverage
- it has certain compatibility guarantees about profiles across compiler
versions

This also means that if the consensus is that -fprofile-instr-generate
should really change its meaning to mean IRPGO, I’m open to having this
internal patch on our side.

Yeah, it sounds like someone is going to have to keep a "private patch" to
change the default. At that point doing a switch on the triple in upstream
seems preferable though :confused:

So I propose the following (which is equivalent to what you proposed, but
starting to put specific option names):
1. Add -fprofile-instr-generate=stable and -fprofile-instr-generate=unstable
a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed to
be compatible across compiler releases (which will include existing
releases). Additionally, -fprofile-instr-generate=stable can be used in
combination with -fcoverage-mapping.
b) Indexed profiles for -fprofile-instr-generate=unstable are not
guaranteed to be compatible across compiler releases. It is an error to use
this in conjunction with -fcoverage-mapping
c) -fprofile-instr-generate defaults to one of the two in a way that meets
vendor needs, via private patches or an upstream switch on the triple or
whatever. It can eventually be deprecated or whatever.

-fprofile-instr-generate=stable would basically correspond to FEPGO / FE
coverage instrumentation. -fprofile-instr-generate=unstable would basically
correspond to IRPGO.

What do you think?

-- Sean Silva

I really don’t like the idea of having differentiation in the default based on triple upstream.
I agree with Fred that changing the default is not good for users. If a vendor really wants to change the default just because they think it is better for their limited set of users, this can be a private patch in a proprietary SDK (if we were on the other side in this case, I suspect we would instruct Xcode to change the flag it passes to the compiler and not change the default of the compiler).

This also means that if the consensus is that -fprofile-instr-generate should really change its meaning to mean IRPGO, I’m open to having this internal patch on our side.

Yeah, it sounds like someone is going to have to keep a “private patch” to change the default. At that point doing a switch on the triple in upstream seems preferable though :confused:

I don’t see why a private patch would be needed.

So I propose the following (which is equivalent to what you proposed, but starting to put specific option names):

  1. Add -fprofile-instr-generate=stable and -fprofile-instr-generate=unstable

That sounds like the right approach. I don’t have a strong opinion about the =stable/unstable names.

a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed to be compatible across compiler releases (which will include existing releases). Additionally, -fprofile-instr-generate=stable can be used in combination with -fcoverage-mapping.
b) Indexed profiles for -fprofile-instr-generate=unstable are not guaranteed to be compatible across compiler releases. It is an error to use this in conjunction with -fcoverage-mapping
c) -fprofile-instr-generate defaults to one of the two in a way that meets vendor needs, via private patches or an upstream switch on the triple or whatever. It can eventually be deprecated or whatever.

Typically we handle things like this by staging the change over a release or two. If we add those two new options (=stable, =unstable), we can have one release where the default matches the old behavior, and then change the default for the following release. Someone relying on the current behavior would then be able to change their builds to use the =stable option so that they won’t be broken when the default switches.

This also means that if the consensus is that -fprofile-instr-generate
should really change its meaning to mean IRPGO, I’m open to having this
internal patch on our side.

Yeah, it sounds like someone is going to have to keep a "private patch" to
change the default. At that point doing a switch on the triple in upstream
seems preferable though :confused:

I don’t see why a private patch would be needed.

On PS4 IRPGO is the default (and actually only; release-to-release
stability of profiles is not an issue, the benefit of IRPGO is too great,
and the support cost of multiple PGO's is undesirable given the overhead
issues with FEPGO. Additionally, we don't plan to invest development effort
in FEPGO as a peak performance configuration).

So I propose the following (which is equivalent to what you proposed, but
starting to put specific option names):
1. Add -fprofile-instr-generate=stable and
-fprofile-instr-generate=unstable

That sounds like the right approach.

Great. I'll post a patch in the next couple days.

I don’t have a strong opinion about the =stable/unstable names.

a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed to
be compatible across compiler releases (which will include existing
releases). Additionally, -fprofile-instr-generate=stable can be used in
combination with -fcoverage-mapping.
b) Indexed profiles for -fprofile-instr-generate=unstable are not
guaranteed to be compatible across compiler releases. It is an error to use
this in conjunction with -fcoverage-mapping
c) -fprofile-instr-generate defaults to one of the two in a way that meets
vendor needs, via private patches or an upstream switch on the triple or
whatever. It can eventually be deprecated or whatever.

Typically we handle things like this by staging the change over a release
or two. If we add those two new options (=stable, =unstable), we can have
one release where the default matches the old behavior, and then change the
default for the following release. Someone relying on the current behavior
would then be able to change their builds to use the =stable option so that
they won’t be broken when the default switches.

Good to know what processes you have in place.

-- Sean Silva

>
> Jake and I have been integrating IRPGO on PS4, and we've identified
3 remaining work items.
>
> Sean, thanks for the write up. It matches very well with what we
think as well.

+ 1

> - Driver changes
>
> We'd like to make IRPGO the default on PS4. We also think that it
would be beneficial to make IRPGO the default PGO on all platforms
(coverage would continue to use FE instr as it does currently, of course).
In previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it
has come up that Apple have requirements that would prevent them from
moving to IRPGO as the default PGO, at least without a deprecation period
of one or two releases.

Sean pointed out the problematic scenario in D15829 (in plan "C"):

All existing user workflows continue to work, except for workflows
that attempt to llvm-profdata merge some old frontend profile data (e.g.
they have checked-in to version control and represents some special
workload) with the profile data from new binaries.

We can address this issue by (1) making sure llvm-profdata emits a
helpful warning when merging an FE-based profile with an IR-based one, and
(2) keeping an option to use FE instrumentation for PGO. Having (2) helps
people who can't (or don't want) to switch to IR PGO.

> I'd like to get consensus on a path forward.
> As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms.

I'd really rather not introduce this inconsistency. I'm worried that
it might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC'd) is following up with some of our internal users to check
if we can change the default behavior of -fprofile-instr-generate. He
should be able to chime in on this soon.

Sorry it took me so long.

Hi Fred,

My understanding is that you were specifically investigating whether
Apple needed compatibility for merging indexed profiles. Is that
compatibility needed? The only compelling argument I have heard to continue
to expose FEPGO is that Apple may have a compatibility requirement for
merging indexed profiles from previous compiler versions.

Sorry no, my comment had nothing to do with merging profiles. I
understand that this will break, and it might very well be an issue for us,
but I think there is a more fundamental issue with the proposed plan. As
you bring it up though, this is a user visible breakage that shouldn’t be
disregarded completely.

Merging with existing indexed profiles is the only user-visible breakage
AFAIK (this was discussed at length in ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation and
the corresponding email thread). Please provide concrete examples where
things would break.

I feel like we are talking past each other. It is a fact that “merging
existing indexed profiles” will break. You consider it to be a reasonable
breakage, I’m just saying that we shouldn't ignore this completely. It is a
known breakage so it should be considered when making the decision about
the new flags’ behavior.
We advise users to commit their profiles to VCS, so this could actually
be a real issue for them (Xcode won’t try to merge old/new profiles by
itself though, but users could do this). For those kind of changes where
the alternative is not a full drop-in replacement, I think deprecating the
option and replacing it with a new one is a better methodology.

Even if this is a requirement, then I still intend to make IRPGO the
default and only PGO going forward, at least on PS4. I think that doing the
same for all platforms in the upstream compiler probably makes sense as
well, since an internal Apple vendor compatibility requirement should not
penalize all users of the open source project.

Again, I’m not expressing an Apple requirement, just trying to discuss
the specifics of the proposed implementation. My goal is not to hinder
anything, and I want our platforms to be able to use IRPGO reliably if
users see the need for it.

What I'm saying is that besides reduced training overhead (and the
inability to merge with older indexed profiles, which AFAIK is the only
actual potential requirement that would need a deprecation period for
FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one
(except as something purely during a deprecation period) is
pointless. "just a better PGO" is what IRPGO is for my users. I don't want
to have to have them deal with (and I don't want to support) FEPGO.

Well we want to support FEPGO because it fits Xcode's workflow better
(mainly it allows to do PGO + coverage at the same time).

Anything that will cause the existing flag to continue to produce FEPGO
on PS4 is not something that I'm really okay with. The reduced overhead of
IRPGO is really important on PS4 (i.e. the difference between the
instrumented game being playable or not). I really don't want to have to
test the triple to determine the meaning of `-fprofile-instr-generate`
(without `-fcoverage-mapping`).

And I agree that IRPGO is better for your users, but the best workflow
for your users is not necessarily the best workflow for every clang user. I
totally agree with you that testing the triple to decide the meaning of the
option is wrong to do upstream. The best default is not a platform choice,
it’s a workflow decision.

I’ve discussed the change in behavior quiet extensively, and I after

having changed my mind a couple times, I would argue in favor of keeping
the current behavior for the existing flags. I think adding a new switch
for IRPGO is a better option. The argument that weighted most on my opinion
is the proposed interaction with -fcoverage-mapping, and it is not at all
platform specific. With the proposed new behavior, turning coverage on and
off in your build system will generate a binary with different performance
characteristics and this feels really wrong.

Bob already mentioned in the other thread that `-fprofile-instr-generate
-fcoverage-mapping` was sufficiently different from
`-fprofile-instr-generate` that `-fprofile-instr-generate
-fcoverage-mapping` was not an acceptable workaround that could be used for
enabling FEPGO during a transitionary period, so I'm not convinced that
your argument here makes sense.

I’m not sure what you’re referring to here, and I have a hard time
parsing the sentence. I suppose “was not an acceptable” should read “was an
acceptable”? I would be surprised that Bob ever agreed to completely
transition away from FEPGO. I didn’t even understand that getting rid of
FEPGO was on table as you seem to imply bellow.

No, it is written as intended. The backstory is in
⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation (and the corresponding email thread). The
paragraph starting with "The coverage mapping adds considerable cost.”.

Thanks for the pointer, it turns out this part of the thread never made
it to my mailbox. What Bob is saying is that we need an option to turn on
FEPGO without coverage, and this is basically exactly what I think we
should do: add an explicit option to choose the instrumentation type, so
that we can use FEPGO even without coverage in the new IRPGO world.

I also share David's opinion that this is not going to be an issue in
practice. I think it makes sense for PGO and coverage to have different
overheads. Coverage inherently has to trace all locations at source level,
while PGO has more freedom.

I’m sorry if I wasn’t clear, but I’m not talking about instrumentation
overhead, I’m talking about the performance of the binary generated using
the profiles. If we go the route of making the meaning of
-fprofile-instr-generate depend on whether -fcoverage-mapping gets passed,
then we change the kind of instrumentation and thus the input to the
optimizations behind the user’s back. I wouldn’t be surprised that using
profiles generated by FEPGO and IRPGO give you a final executable with
measurably different performance characteristics.

I think the point is that given the effort being put into IRPGO, the
IRPGO version will always be a faster final executable.

2 things:
- Do you have strong evidence of this being true in all cases? I’m not
looking forward to when I’ll transition Apple’s clang to IRPGO, because I
know that we’ll have to spend time characterizing the performance of the
compiler. It might be faster in all but a couple benchmarks, we still will
need to investigate what happened there to see if we can fix it. I’m not
saying it’s bad (I’m actually also looking forward to evaluate it, because
it might give better results for us), but it’s definitely not a benign
change that I would do without double checking the results. And as such, I
also wouldn’t want the compiler to silently do this change behind my back.
- Even if it is 100% true, then users will see a degradation in
performance if they add -fcoverage-mapping to their CFLAGS. I think this is
just bad user experience. The user should be able to decide if he wants to
be able to compromise some performance to be able to do PGO+coverage at the
same time.

Why provide a "worse" PGO option?

Worse from your point of view. The resilience to compiler changes that
FEPGO currently offers is a real feature even if it doesn’t have a place in
Sony’s workflow.

If you’re tracking your performance, this can be really painful.
Recently we wasted days investigating performance regressions that were due
to buggy profiles. I strongly believe having an option seemingly unrelated
to PGO change this behavior is wrong and can cause actual pain for our end
users.

After a deprecation period we can force `-fprofile-instr-generate` and
`-fcoverage-mapping` to be mutually exclusive if necessary. Does this solve
your problem?

We don’t want a “deprecation period”. Unless IRPGO evolves in something
that can do precise coverage and PGO in the same run, FEPGO can’t be
deprecated (as in slated for removal).

Actually, I think it makes a lot of sense in some respects for
`-fprofile-instr-generate` and `-fprofile-instr-generate
-fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a
user's perspective is basically "is the instrumentation inserted by the
compiler constrained to have source-level coverage, or does the compiler
not have this restriction". Although as I've said, I'm not a fan of
supporting FEPGO in the long-term due to maintenance issues.

Also note that things like the context-sensitivity obtained through
pre-inlining (see Rong's original RFC) is simply not obtainable within a
source-level instrumentation paradigm (even if we did something like the
counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant
profiling instrumentation" to reduce the overhead to that of IRPGO with
pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be
at an inherent disadvantage.

Also, David's point about redundant work on FEPGO is a good one. We
don't want to continue maintaining two different PGO’s.

Are you implying that LLVM should drop FEPGO? It’s a totally sensible
thing to do to use your tests as training data for your profile generation.
It’s also a very valid thing to do to use your tests to do coverage. Xcode
does both of these things. I would see it a a big regression to not support
doing both at the same time (this would mean doubling compile+testing time
for users of both).

As David pointed out, training runs for PGO and coverage have different
goals. I'm very skeptical of any testing that tries to do both at the same
time, but this will continue to work (albeit without benefitting from any
of the effort being put into IRPGO).

For a high-end game, sure. Small apps are a different beast and it’s a
reasonable thing to use your set of tests as training data. Again, all the
workflows are not equivalent.

As the instrumentation needs to stay there for coverage anyway, I expect

FEPGO to stay there and maintained (we care a lot about coverage). I’m not
saying that all the work going into IRPGO should be duplicated in FEPGO,
but what’s there and working should keep working.

For my users the reduced overhead of IRPGO is an important feature, and
making it the default is important for that reason. Since most of the
effort going into PGO is focused on IRPGO, this will lead to users using
FEPGO ending up as a second-tier PGO, which Vedant said he specifically
wanted to avoid. The only option to avoid this is for users to not be using
FEPGO.

I’d love to have everybody be able to enjoy the lower overhead, but if
the answer to this is that they need to compile+train twice to get coverage
and PGO, then the trade-off is wrong. I suppose you can agree that every
user doesn’t have the same requirements as game developers. At least users
should have the choice.

Also, FEPGO has a lot of nice characteristics like resilience to IRGEN

changes. If you have archived profiles, then when you switch compilers your
performance shouldn’t degrade with FEPGO (modulo optimization bugs), while
it’s much more likely to degrade with IRPGO.

Note that this use case continues to work. I.e. we continue to apply
existing frontend profiles correctly. including frontend profiles generated
with -fcoverage-mapping, so that collecting coverage and PGO at the same
time (although not advisable) still works. The only use case that breaks is
merging existing indexed profiles, which is why we are specifically waiting
for an answer on whether this is a requirement for you guys at Apple, which
will determine what kind of deprecation period etc. will be needed before
we can default it.

It overall looks like a much better option for people who do not need
the lower instrumentation overhead.

This is not just about lower instrumentation overhead. Things like the
recently added static VP node allocation (which will e.g. make indirect
callsite promotion for LTO'd kernels work) are other things are being
missed out on.

I would actually make the IRPGO mode completely incompatible with the

-fcoverage-mapping flag.

I'm not sure what you mean by this. Nobody is proposing anything that
would make -fcoverage-mapping do anything related to IRPGO.

What I mean is that -f<whatever enables IRPGO> should error out when
passed at the same time as -fcoverage-mapping.

I think you're coming into this with the mindset that FEPGO will still be
a possibility (outside of a build that is used for coverage mapping). I'm
not convinced that we actually need to continue exposing that except as a
weird thing in conjunction with coverage (and possibly for a deprecation
period if users want to merge indexed profiles).

FEPGO needs to be a possibility. And the instrumentation code needs to be
there anyway for the coverage. Here’s a proposal:
- Add -fpgo-instr=[llvm|clang] to choose which kind of PGO
instrumentation you want
- make -fpgo-instr=llvm incompatible with with -fcoverage-mapping
- deprecate -fprofile-instr-generate, and remove it after a couple
releases

(I really don’t care about the names)

I think it is a better way forward to have -fprofile-instr-generate keep
it’s current meaning and to have downstream toolchain providers have an
internal patch to make it an alias to the one they prefer. Just because
changing the meaning of options is a bad thing.

Sure, I can deal with that. And actually this discussion has suggested
some good user-understandable names for the two different PGO's. I think
that "stable PGO" or "coverage-based PGO" or "source-level PGO" is a really
good name for FEPGO. "frontend" and "IR" aren't really meaningful to users.

Based on what I have seen, FEPGO has the following reasons to keep it
around:
- it currently can be used together with coverage
- it has certain compatibility guarantees about profiles across compiler
versions

This also means that if the consensus is that -fprofile-instr-generate
should really change its meaning to mean IRPGO, I’m open to having this
internal patch on our side.

Yeah, it sounds like someone is going to have to keep a "private patch" to
change the default. At that point doing a switch on the triple in upstream
seems preferable though :confused:

So I propose the following (which is equivalent to what you proposed, but
starting to put specific option names):
1. Add -fprofile-instr-generate=stable and
-fprofile-instr-generate=unstable
a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed to
be compatible across compiler releases (which will include existing
releases). Additionally, -fprofile-instr-generate=stable can be used in
combination with -fcoverage-mapping.
b) Indexed profiles for -fprofile-instr-generate=unstable are not
guaranteed to be compatible across compiler releases. It is an error to use
this in conjunction with -fcoverage-mapping
c) -fprofile-instr-generate defaults to one of the two in a way that meets
vendor needs, via private patches or an upstream switch on the triple or
whatever. It can eventually be deprecated or whatever.

-fprofile-instr-generate=stable would basically correspond to FEPGO / FE
coverage instrumentation. -fprofile-instr-generate=unstable would basically
correspond to IRPGO.

What do you think?

Sounds fine to me, though I am not a fan of using unstable in the option. I
think a more meaningful way (that capture the essence of the difference) is
the following naming:
1) FEPGO: -fprofile-instr-generate=source or
-fprofile-instr-generate=region
2) IR: -fprofile-instr-generate=cfg or -fprofile-instr-generate=graph

Also since -fprofile-instr-generate= form is already used to specify raw
profile path, we may need a different driver option. Alternatives include
1) -fprofile-instrument=<...> -- this maps directly to the cc1 option we
have today
or
2) -fpgo-instr=<> -- suggested by Fred or
3) -fpgo-instr-method=<...>

Having the driver level option is the first good step forward. In the near
future, when performance of IRPGO is further tuned (e.g, better integration
with inliner, more runtime perf win with -flto=thin, -flto etc), and the
interests of the IRPGO is better aligned, we can revisit the default.

thanks,

David

Well, it’s up to you, but you could just as well tell everyone doing PS4 development to use the new -fprofile-instr-generate=unstable (or whatever it ends up being called). If “FEPGO” doesn’t work for you, then presumably no one is using the existing -fprofile-instr-generate option for PS4….

The “we” in my reply was referring to the LLVM project. It’s not about what processes either Apple or I might use. We, the LLVM community, have a responsibility to provide a good migration path for our users when we decide to change the behavior of an existing feature.

Sounds fine to me, though I am not a fan of using unstable in the option. I think a more meaningful way (that capture the essence of the difference) is the following naming:

  1. FEPGO: -fprofile-instr-generate=source or -fprofile-instr-generate=region
  2. IR: -fprofile-instr-generate=cfg or -fprofile-instr-generate=graph

FWIW, I prefer source/cfg to stable/unstable.

Fred

>
> Jake and I have been integrating IRPGO on PS4, and we've identified
3 remaining work items.
>
> Sean, thanks for the write up. It matches very well with what we
think as well.

+ 1

> - Driver changes
>
> We'd like to make IRPGO the default on PS4. We also think that it
would be beneficial to make IRPGO the default PGO on all platforms
(coverage would continue to use FE instr as it does currently, of course).
In previous conversations (e.g. ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation) it
has come up that Apple have requirements that would prevent them from
moving to IRPGO as the default PGO, at least without a deprecation period
of one or two releases.

Sean pointed out the problematic scenario in D15829 (in plan "C"):

All existing user workflows continue to work, except for workflows
that attempt to llvm-profdata merge some old frontend profile data (e.g.
they have checked-in to version control and represents some special
workload) with the profile data from new binaries.

We can address this issue by (1) making sure llvm-profdata emits a
helpful warning when merging an FE-based profile with an IR-based one, and
(2) keeping an option to use FE instrumentation for PGO. Having (2) helps
people who can't (or don't want) to switch to IR PGO.

> I'd like to get consensus on a path forward.
> As a point of discussion, how about we make IRPGO the default on
all platforms except Apple platforms.

I'd really rather not introduce this inconsistency. I'm worried that
it might lead to Darwin becoming a second-tier platform for PGO.

Fred (CC'd) is following up with some of our internal users to check
if we can change the default behavior of -fprofile-instr-generate. He
should be able to chime in on this soon.

Sorry it took me so long.

Hi Fred,

My understanding is that you were specifically investigating whether
Apple needed compatibility for merging indexed profiles. Is that
compatibility needed? The only compelling argument I have heard to continue
to expose FEPGO is that Apple may have a compatibility requirement for
merging indexed profiles from previous compiler versions.

Sorry no, my comment had nothing to do with merging profiles. I
understand that this will break, and it might very well be an issue for us,
but I think there is a more fundamental issue with the proposed plan. As
you bring it up though, this is a user visible breakage that shouldn’t be
disregarded completely.

Merging with existing indexed profiles is the only user-visible breakage
AFAIK (this was discussed at length in ⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation and
the corresponding email thread). Please provide concrete examples where
things would break.

I feel like we are talking past each other. It is a fact that “merging
existing indexed profiles” will break. You consider it to be a reasonable
breakage, I’m just saying that we shouldn't ignore this completely. It is a
known breakage so it should be considered when making the decision about
the new flags’ behavior.
We advise users to commit their profiles to VCS, so this could actually
be a real issue for them (Xcode won’t try to merge old/new profiles by
itself though, but users could do this). For those kind of changes where
the alternative is not a full drop-in replacement, I think deprecating the
option and replacing it with a new one is a better methodology.

Even if this is a requirement, then I still intend to make IRPGO the
default and only PGO going forward, at least on PS4. I think that doing the
same for all platforms in the upstream compiler probably makes sense as
well, since an internal Apple vendor compatibility requirement should not
penalize all users of the open source project.

Again, I’m not expressing an Apple requirement, just trying to discuss
the specifics of the proposed implementation. My goal is not to hinder
anything, and I want our platforms to be able to use IRPGO reliably if
users see the need for it.

What I'm saying is that besides reduced training overhead (and the
inability to merge with older indexed profiles, which AFAIK is the only
actual potential requirement that would need a deprecation period for
FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one
(except as something purely during a deprecation period) is
pointless. "just a better PGO" is what IRPGO is for my users. I don't want
to have to have them deal with (and I don't want to support) FEPGO.

Well we want to support FEPGO because it fits Xcode's workflow better
(mainly it allows to do PGO + coverage at the same time).

Anything that will cause the existing flag to continue to produce FEPGO
on PS4 is not something that I'm really okay with. The reduced overhead of
IRPGO is really important on PS4 (i.e. the difference between the
instrumented game being playable or not). I really don't want to have to
test the triple to determine the meaning of `-fprofile-instr-generate`
(without `-fcoverage-mapping`).

And I agree that IRPGO is better for your users, but the best workflow
for your users is not necessarily the best workflow for every clang user. I
totally agree with you that testing the triple to decide the meaning of the
option is wrong to do upstream. The best default is not a platform choice,
it’s a workflow decision.

I’ve discussed the change in behavior quiet extensively, and I after

having changed my mind a couple times, I would argue in favor of keeping
the current behavior for the existing flags. I think adding a new switch
for IRPGO is a better option. The argument that weighted most on my opinion
is the proposed interaction with -fcoverage-mapping, and it is not at all
platform specific. With the proposed new behavior, turning coverage on and
off in your build system will generate a binary with different performance
characteristics and this feels really wrong.

Bob already mentioned in the other thread that
`-fprofile-instr-generate -fcoverage-mapping` was sufficiently different
from `-fprofile-instr-generate` that `-fprofile-instr-generate
-fcoverage-mapping` was not an acceptable workaround that could be used for
enabling FEPGO during a transitionary period, so I'm not convinced that
your argument here makes sense.

I’m not sure what you’re referring to here, and I have a hard time
parsing the sentence. I suppose “was not an acceptable” should read “was an
acceptable”? I would be surprised that Bob ever agreed to completely
transition away from FEPGO. I didn’t even understand that getting rid of
FEPGO was on table as you seem to imply bellow.

No, it is written as intended. The backstory is in
⚙ D15829 [PGO] Clang Option that enables IR level PGO instrumentation (and the corresponding email thread).
The paragraph starting with "The coverage mapping adds considerable cost.”.

Thanks for the pointer, it turns out this part of the thread never made
it to my mailbox. What Bob is saying is that we need an option to turn on
FEPGO without coverage, and this is basically exactly what I think we
should do: add an explicit option to choose the instrumentation type, so
that we can use FEPGO even without coverage in the new IRPGO world.

I also share David's opinion that this is not going to be an issue in
practice. I think it makes sense for PGO and coverage to have different
overheads. Coverage inherently has to trace all locations at source level,
while PGO has more freedom.

I’m sorry if I wasn’t clear, but I’m not talking about instrumentation
overhead, I’m talking about the performance of the binary generated using
the profiles. If we go the route of making the meaning of
-fprofile-instr-generate depend on whether -fcoverage-mapping gets passed,
then we change the kind of instrumentation and thus the input to the
optimizations behind the user’s back. I wouldn’t be surprised that using
profiles generated by FEPGO and IRPGO give you a final executable with
measurably different performance characteristics.

I think the point is that given the effort being put into IRPGO, the
IRPGO version will always be a faster final executable.

2 things:
- Do you have strong evidence of this being true in all cases? I’m not
looking forward to when I’ll transition Apple’s clang to IRPGO, because I
know that we’ll have to spend time characterizing the performance of the
compiler. It might be faster in all but a couple benchmarks, we still will
need to investigate what happened there to see if we can fix it. I’m not
saying it’s bad (I’m actually also looking forward to evaluate it, because
it might give better results for us), but it’s definitely not a benign
change that I would do without double checking the results. And as such, I
also wouldn’t want the compiler to silently do this change behind my back.
- Even if it is 100% true, then users will see a degradation in
performance if they add -fcoverage-mapping to their CFLAGS. I think this is
just bad user experience. The user should be able to decide if he wants to
be able to compromise some performance to be able to do PGO+coverage at the
same time.

Why provide a "worse" PGO option?

Worse from your point of view. The resilience to compiler changes that
FEPGO currently offers is a real feature even if it doesn’t have a place in
Sony’s workflow.

If you’re tracking your performance, this can be really painful.
Recently we wasted days investigating performance regressions that were due
to buggy profiles. I strongly believe having an option seemingly unrelated
to PGO change this behavior is wrong and can cause actual pain for our end
users.

After a deprecation period we can force `-fprofile-instr-generate` and
`-fcoverage-mapping` to be mutually exclusive if necessary. Does this solve
your problem?

We don’t want a “deprecation period”. Unless IRPGO evolves in something
that can do precise coverage and PGO in the same run, FEPGO can’t be
deprecated (as in slated for removal).

Actually, I think it makes a lot of sense in some respects for
`-fprofile-instr-generate` and `-fprofile-instr-generate
-fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a
user's perspective is basically "is the instrumentation inserted by the
compiler constrained to have source-level coverage, or does the compiler
not have this restriction". Although as I've said, I'm not a fan of
supporting FEPGO in the long-term due to maintenance issues.

Also note that things like the context-sensitivity obtained through
pre-inlining (see Rong's original RFC) is simply not obtainable within a
source-level instrumentation paradigm (even if we did something like the
counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant
profiling instrumentation" to reduce the overhead to that of IRPGO with
pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be
at an inherent disadvantage.

Also, David's point about redundant work on FEPGO is a good one. We
don't want to continue maintaining two different PGO’s.

Are you implying that LLVM should drop FEPGO? It’s a totally sensible
thing to do to use your tests as training data for your profile generation.
It’s also a very valid thing to do to use your tests to do coverage. Xcode
does both of these things. I would see it a a big regression to not support
doing both at the same time (this would mean doubling compile+testing time
for users of both).

As David pointed out, training runs for PGO and coverage have different
goals. I'm very skeptical of any testing that tries to do both at the same
time, but this will continue to work (albeit without benefitting from any
of the effort being put into IRPGO).

For a high-end game, sure. Small apps are a different beast and it’s a
reasonable thing to use your set of tests as training data. Again, all the
workflows are not equivalent.

As the instrumentation needs to stay there for coverage anyway, I expect

FEPGO to stay there and maintained (we care a lot about coverage). I’m not
saying that all the work going into IRPGO should be duplicated in FEPGO,
but what’s there and working should keep working.

For my users the reduced overhead of IRPGO is an important feature, and
making it the default is important for that reason. Since most of the
effort going into PGO is focused on IRPGO, this will lead to users using
FEPGO ending up as a second-tier PGO, which Vedant said he specifically
wanted to avoid. The only option to avoid this is for users to not be using
FEPGO.

I’d love to have everybody be able to enjoy the lower overhead, but if
the answer to this is that they need to compile+train twice to get coverage
and PGO, then the trade-off is wrong. I suppose you can agree that every
user doesn’t have the same requirements as game developers. At least users
should have the choice.

Also, FEPGO has a lot of nice characteristics like resilience to IRGEN

changes. If you have archived profiles, then when you switch compilers your
performance shouldn’t degrade with FEPGO (modulo optimization bugs), while
it’s much more likely to degrade with IRPGO.

Note that this use case continues to work. I.e. we continue to apply
existing frontend profiles correctly. including frontend profiles generated
with -fcoverage-mapping, so that collecting coverage and PGO at the same
time (although not advisable) still works. The only use case that breaks is
merging existing indexed profiles, which is why we are specifically waiting
for an answer on whether this is a requirement for you guys at Apple, which
will determine what kind of deprecation period etc. will be needed before
we can default it.

It overall looks like a much better option for people who do not need
the lower instrumentation overhead.

This is not just about lower instrumentation overhead. Things like the
recently added static VP node allocation (which will e.g. make indirect
callsite promotion for LTO'd kernels work) are other things are being
missed out on.

I would actually make the IRPGO mode completely incompatible with the

-fcoverage-mapping flag.

I'm not sure what you mean by this. Nobody is proposing anything that
would make -fcoverage-mapping do anything related to IRPGO.

What I mean is that -f<whatever enables IRPGO> should error out when
passed at the same time as -fcoverage-mapping.

I think you're coming into this with the mindset that FEPGO will still
be a possibility (outside of a build that is used for coverage mapping).
I'm not convinced that we actually need to continue exposing that except as
a weird thing in conjunction with coverage (and possibly for a deprecation
period if users want to merge indexed profiles).

FEPGO needs to be a possibility. And the instrumentation code needs to
be there anyway for the coverage. Here’s a proposal:
- Add -fpgo-instr=[llvm|clang] to choose which kind of PGO
instrumentation you want
- make -fpgo-instr=llvm incompatible with with -fcoverage-mapping
- deprecate -fprofile-instr-generate, and remove it after a couple
releases

(I really don’t care about the names)

I think it is a better way forward to have -fprofile-instr-generate keep
it’s current meaning and to have downstream toolchain providers have an
internal patch to make it an alias to the one they prefer. Just because
changing the meaning of options is a bad thing.

Sure, I can deal with that. And actually this discussion has suggested
some good user-understandable names for the two different PGO's. I think
that "stable PGO" or "coverage-based PGO" or "source-level PGO" is a really
good name for FEPGO. "frontend" and "IR" aren't really meaningful to users.

Based on what I have seen, FEPGO has the following reasons to keep it
around:
- it currently can be used together with coverage
- it has certain compatibility guarantees about profiles across compiler
versions

This also means that if the consensus is that -fprofile-instr-generate
should really change its meaning to mean IRPGO, I’m open to having this
internal patch on our side.

Yeah, it sounds like someone is going to have to keep a "private patch"
to change the default. At that point doing a switch on the triple in
upstream seems preferable though :confused:

So I propose the following (which is equivalent to what you proposed, but
starting to put specific option names):
1. Add -fprofile-instr-generate=stable and
-fprofile-instr-generate=unstable
a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed to
be compatible across compiler releases (which will include existing
releases). Additionally, -fprofile-instr-generate=stable can be used in
combination with -fcoverage-mapping.
b) Indexed profiles for -fprofile-instr-generate=unstable are not
guaranteed to be compatible across compiler releases. It is an error to use
this in conjunction with -fcoverage-mapping
c) -fprofile-instr-generate defaults to one of the two in a way that
meets vendor needs, via private patches or an upstream switch on the triple
or whatever. It can eventually be deprecated or whatever.

-fprofile-instr-generate=stable would basically correspond to FEPGO / FE
coverage instrumentation. -fprofile-instr-generate=unstable would basically
correspond to IRPGO.

What do you think?

Sounds fine to me, though I am not a fan of using unstable in the option.
I think a more meaningful way (that capture the essence of the difference)
is the following naming:
1) FEPGO: -fprofile-instr-generate=source or
-fprofile-instr-generate=region
2) IR: -fprofile-instr-generate=cfg or -fprofile-instr-generate=graph

Also since -fprofile-instr-generate= form is already used to specify raw
profile path, we may need a different driver option.

Oops!

Alternatives include
1) -fprofile-instrument=<...> -- this maps directly to the cc1 option we
have today
or
2) -fpgo-instr=<> -- suggested by Fred or
3) -fpgo-instr-method=<...>

I will probably use one of these in the patch.

-- Sean Silva

Quick update. I’ve gotten derailed from posting a patch for this due to focusing on higher priority PGO inlining work. No ETA.

– Sean Silva

No problem. Inliner work is certainly higher priority. Rong can help with the option related work.

thanks,
David

No problem. Inliner work is certainly higher priority. Rong can help with
the option related work.

FYI I've started working on a patch for the driver changes and would be
happy to take care of it. I'll post the patch in the next week or so.

Jake

Random bikeshedding. I like fprofile-instrument because it merges a lot of similar ideas behind instrumenting - and oddly enough what I was suggesting in the x-ray thread before seeing this.

-eric

+1 to -fprofile-instrument=… (as someone working on the XRay stuff, I’d much rather have less flags, and consolidate a lot of these similar things into a more inclusive flag).

I would even make it shorter, and say -finstrument={profile-…, xray-…} so we can have multiple “namespaced” values for -finstrument=.

Just my A$0.02.

There is some misunderstanding about the intention of this flag. The purpose of the flag is not to turn on profile instrumentation (which already has -fprofile-instr-generate or -fprofile-generate for it), but to select which instrumentors to use for PGO (IR or FE). I prefer fewer flags too, but sharing flags for completely different purpose does not seem like the right thing to do.

David

There is some misunderstanding about the intention of this flag. The purpose of the flag is not to turn on profile instrumentation (which already has -fprofile-instr-generate or -fprofile-generate for it), but to select which instrumentors to use for PGO (IR or FE). I prefer fewer flags too, but sharing flags for completely different purpose does not seem like the right thing to do.

Ah, right. It does seem like I’m misunderstanding the intention for that flag.

FWIW, I think consolidating the flags can happen later on, when we have a better idea of how the different instrumentation pieces fit together. Right now IIUC the only other thing that might count as an alternate instrumentation implementation is XRay (and I’m not even sure they’re mutually exclusive either, i.e. both the profiling and XRay instrumentation should be able to live together in the same binary).

So if it’s just the two that will be interacting, then sharing flags doesn’t seem worth it. But if I’m missing another, then the case for consolidating the flags becomes stronger.

Cheers

There is some misunderstanding about the intention of this flag. The
purpose of the flag is not to turn on profile instrumentation (which
already has -fprofile-instr-generate or -fprofile-generate for it), but to
select which instrumentors to use for PGO (IR or FE). I prefer fewer flags
too, but sharing flags for completely different purpose does not seem like
the right thing to do.

Ah, right. It does seem like I'm misunderstanding the intention for that
flag.

FWIW, I think consolidating the flags can happen later on, when we have a
better idea of how the different instrumentation pieces fit together.

I hope so :slight_smile: Note that PGO related instrumentation is not entirely the
same as general profiling instrumentations. The former should also have a
corresponding profile-use component in the compiler. Current PGO
instrumentation has edge/block profiling and a value profiler component.
The FE based edge/block profiler is also shared with coverage testing.

Right now IIUC the only other thing that might count as an alternate
instrumentation implementation is XRay (and I'm not even sure they're
mutually exclusive either, i.e. both the profiling and XRay instrumentation
should be able to live together in the same binary).

Sanitizers (asan, tsan, ubsan, msan) are all program instrumenters. Asan
also supports a mode for coverage testing. Profile related, we also have
esan (efficiency sanitizer) under development. The sanitizer options of
course are unified in its own category.

David