>
> Jake and I have been integrating IRPGO on PS4, and we've identified 3
remaining work items.
>
> Sean, thanks for the write up. It matches very well with what we think
as well.
+ 1
> - Driver changes
>
> We'd like to make IRPGO the default on PS4. We also think that it
would be beneficial to make IRPGO the default PGO on all platforms
(coverage would continue to use FE instr as it does currently, of course).
In previous conversations (e.g. http://reviews.llvm.org/D15829) it has
come up that Apple have requirements that would prevent them from moving to
IRPGO as the default PGO, at least without a deprecation period of one or
two releases.
Sean pointed out the problematic scenario in D15829 (in plan "C"):
All existing user workflows continue to work, except for workflows that
attempt to llvm-profdata merge some old frontend profile data (e.g. they
have checked-in to version control and represents some special workload)
with the profile data from new binaries.
We can address this issue by (1) making sure llvm-profdata emits a
helpful warning when merging an FE-based profile with an IR-based one, and
(2) keeping an option to use FE instrumentation for PGO. Having (2) helps
people who can't (or don't want) to switch to IR PGO.
> I'd like to get consensus on a path forward.
> As a point of discussion, how about we make IRPGO the default on all
platforms except Apple platforms.
I'd really rather not introduce this inconsistency. I'm worried that it
might lead to Darwin becoming a second-tier platform for PGO.
Fred (CC'd) is following up with some of our internal users to check if
we can change the default behavior of -fprofile-instr-generate. He should
be able to chime in on this soon.
Sorry it took me so long.
Hi Fred,
My understanding is that you were specifically investigating whether Apple
needed compatibility for merging indexed profiles. Is that compatibility
needed? The only compelling argument I have heard to continue to expose
FEPGO is that Apple may have a compatibility requirement for merging
indexed profiles from previous compiler versions.
Sorry no, my comment had nothing to do with merging profiles. I understand
that this will break, and it might very well be an issue for us, but I
think there is a more fundamental issue with the proposed plan. As you
bring it up though, this is a user visible breakage that shouldn’t be
disregarded completely.
Merging with existing indexed profiles is the only user-visible breakage
AFAIK (this was discussed at length in http://reviews.llvm.org/D15829 and
the corresponding email thread). Please provide concrete examples where
things would break.
Even if this is a requirement, then I still intend to make IRPGO the
default and only PGO going forward, at least on PS4. I think that doing the
same for all platforms in the upstream compiler probably makes sense as
well, since an internal Apple vendor compatibility requirement should not
penalize all users of the open source project.
Again, I’m not expressing an Apple requirement, just trying to discuss the
specifics of the proposed implementation. My goal is not to hinder
anything, and I want our platforms to be able to use IRPGO reliably if
users see the need for it.
What I'm saying is that besides reduced training overhead (and the
inability to merge with older indexed profiles, which AFAIK is the only
actual potential requirement that would need a deprecation period for
FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one
(except as something purely during a deprecation period) is
pointless. "just a better PGO" is what IRPGO is for my users. I don't want
to have to have them deal with (and I don't want to support) FEPGO.
Anything that will cause the existing flag to continue to produce FEPGO on
PS4 is not something that I'm really okay with. The reduced overhead of
IRPGO is really important on PS4 (i.e. the difference between the
instrumented game being playable or not). I really don't want to have to
test the triple to determine the meaning of `-fprofile-instr-generate`
(without `-fcoverage-mapping`).
I’ve discussed the change in behavior quiet extensively, and I after
having changed my mind a couple times, I would argue in favor of keeping
the current behavior for the existing flags. I think adding a new switch
for IRPGO is a better option. The argument that weighted most on my opinion
is the proposed interaction with -fcoverage-mapping, and it is not at all
platform specific. With the proposed new behavior, turning coverage on and
off in your build system will generate a binary with different performance
characteristics and this feels really wrong.
Bob already mentioned in the other thread that `-fprofile-instr-generate
-fcoverage-mapping` was sufficiently different from
`-fprofile-instr-generate` that `-fprofile-instr-generate
-fcoverage-mapping` was not an acceptable workaround that could be used for
enabling FEPGO during a transitionary period, so I'm not convinced that
your argument here makes sense.
I’m not sure what you’re referring to here, and I have a hard time parsing
the sentence. I suppose “was not an acceptable” should read “was an
acceptable”? I would be surprised that Bob ever agreed to completely
transition away from FEPGO. I didn’t even understand that getting rid of
FEPGO was on table as you seem to imply bellow.
No, it is written as intended. The backstory is in
http://reviews.llvm.org/D15829 (and the corresponding email thread). The
paragraph starting with "The coverage mapping adds considerable cost.".
I also share David's opinion that this is not going to be an issue in
practice. I think it makes sense for PGO and coverage to have different
overheads. Coverage inherently has to trace all locations at source level,
while PGO has more freedom.
I’m sorry if I wasn’t clear, but I’m not talking about instrumentation
overhead, I’m talking about the performance of the binary generated using
the profiles. If we go the route of making the meaning of
-fprofile-instr-generate depend on whether -fcoverage-mapping gets passed,
then we change the kind of instrumentation and thus the input to the
optimizations behind the user’s back. I wouldn’t be surprised that using
profiles generated by FEPGO and IRPGO give you a final executable with
measurably different performance characteristics.
I think the point is that given the effort being put into IRPGO, the IRPGO
version will always be a faster final executable. Why provide a "worse" PGO
option?
If you’re tracking your performance, this can be really painful. Recently
we wasted days investigating performance regressions that were due to buggy
profiles. I strongly believe having an option seemingly unrelated to PGO
change this behavior is wrong and can cause actual pain for our end users.
After a deprecation period we can force `-fprofile-instr-generate` and
`-fcoverage-mapping` to be mutually exclusive if necessary. Does this solve
your problem?
Actually, I think it makes a lot of sense in some respects for
`-fprofile-instr-generate` and `-fprofile-instr-generate
-fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a
user's perspective is basically "is the instrumentation inserted by the
compiler constrained to have source-level coverage, or does the compiler
not have this restriction". Although as I've said, I'm not a fan of
supporting FEPGO in the long-term due to maintenance issues.
Also note that things like the context-sensitivity obtained through
pre-inlining (see Rong's original RFC) is simply not obtainable within a
source-level instrumentation paradigm (even if we did something like the
counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant
profiling instrumentation" to reduce the overhead to that of IRPGO with
pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be
at an inherent disadvantage.
Also, David's point about redundant work on FEPGO is a good one. We don't
want to continue maintaining two different PGO’s.
Are you implying that LLVM should drop FEPGO? It’s a totally sensible
thing to do to use your tests as training data for your profile generation.
It’s also a very valid thing to do to use your tests to do coverage. Xcode
does both of these things. I would see it a a big regression to not support
doing both at the same time (this would mean doubling compile+testing time
for users of both).
As David pointed out, training runs for PGO and coverage have different
goals. I'm very skeptical of any testing that tries to do both at the same
time, but this will continue to work (albeit without benefitting from any
of the effort being put into IRPGO).
As the instrumentation needs to stay there for coverage anyway, I expect
FEPGO to stay there and maintained (we care a lot about coverage). I’m not
saying that all the work going into IRPGO should be duplicated in FEPGO,
but what’s there and working should keep working.
For my users the reduced overhead of IRPGO is an important feature, and
making it the default is important for that reason. Since most of the
effort going into PGO is focused on IRPGO, this will lead to users using
FEPGO ending up as a second-tier PGO, which Vedant said he specifically
wanted to avoid. The only option to avoid this is for users to not be using
FEPGO.
Also, FEPGO has a lot of nice characteristics like resilience to IRGEN
changes. If you have archived profiles, then when you switch compilers your
performance shouldn’t degrade with FEPGO (modulo optimization bugs), while
it’s much more likely to degrade with IRPGO.
Note that this use case continues to work. I.e. we continue to apply
existing frontend profiles correctly. including frontend profiles generated
with -fcoverage-mapping, so that collecting coverage and PGO at the same
time (although not advisable) still works. The only use case that breaks is
merging existing indexed profiles, which is why we are specifically waiting
for an answer on whether this is a requirement for you guys at Apple, which
will determine what kind of deprecation period etc. will be needed before
we can default it.
It overall looks like a much better option for people who do not need the
lower instrumentation overhead.
This is not just about lower instrumentation overhead. Things like the
recently added static VP node allocation (which will e.g. make indirect
callsite promotion for LTO'd kernels work) are other things are being
missed out on.
I would actually make the IRPGO mode completely incompatible with the
-fcoverage-mapping flag.
I'm not sure what you mean by this. Nobody is proposing anything that
would make -fcoverage-mapping do anything related to IRPGO.
What I mean is that -f<whatever enables IRPGO> should error out when
passed at the same time as -fcoverage-mapping.
I think you're coming into this with the mindset that FEPGO will still be a
possibility (outside of a build that is used for coverage mapping). I'm not
convinced that we actually need to continue exposing that except as a weird
thing in conjunction with coverage (and possibly for a deprecation period
if users want to merge indexed profiles).
-- Sean Silva