I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?
Here are the existing modes as I understand them:
Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.
Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.
IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.
Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.
I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?
Here are the existing modes as I understand them:
1. Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.
2. Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.
3. IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the "pre-inliner" pass and other simplification passes so that there are fewer counter updates and a bit more precision.
4. Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a "second round" of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.
---
Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.
Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?
Basically what happened is that Chrome started using frontend PGO, and we found that binary patch updates were too large: https://crbug.com/1082307 I came across these Android docs mentioning IR PGO, and suggested we try that: https://source.android.com/devices/tech/perf/pgo Switching to IR PGO fixed the binary patch size issues and got better scores on Speedometer (1.17x improvement over non-PGO with IR PGO vs. 1.11x improvement with frontend PGO), so we never looked back. We consulted the Google optimization team folks, and they recommended we use IR PGO for performance. That’s where they are focusing all of their efforts.
Essentially, I’d like to figure out which mode is likely to be the best supported going forward, and document that.
I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?
Here are the existing modes as I understand them:
Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.
Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.
IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.
Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.
Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.
Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?
Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.
Deprecating frontend-PGO (and making it for coverage testing only) would be desirable as Reid said.
I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?
Here are the existing modes as I understand them:
Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.
Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.
IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.
Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.
Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.
Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?
Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.
Michael (just cc’d) and Alex L. have more context about the AppleClang release process and would be better suited to answer.
Historically, we’ve expected there to be some amount of source drift between the latest PGO/training build available from CI and the release tag. FE PGO is supposed to degrade gracefully when source drift occurs, and I believe we rely on that feature.
I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?
Here are the existing modes as I understand them:
Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.
Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.
IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.
Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.
Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.
Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?
Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.
Michael (just cc’d) and Alex L. have more context about the AppleClang release process and would be better suited to answer.
Historically, we’ve expected there to be some amount of source drift between the latest PGO/training build available from CI and the release tag. FE PGO is supposed to degrade gracefully when source drift occurs, and I believe we rely on that feature.
Both IR and Frontend PGO use content based hashing for profile lookup. Is there anything in FE PGO which makes it more tolerant to source drifts?
I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?
Here are the existing modes as I understand them:
Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.
Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.
IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.
Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.
Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.
Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?
Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.
Michael (just cc’d) and Alex L. have more context about the AppleClang release process and would be better suited to answer.
Historically, we’ve expected there to be some amount of source drift between the latest PGO/training build available from CI and the release tag. FE PGO is supposed to degrade gracefully when source drift occurs, and I believe we rely on that feature.
Both IR and Frontend PGO use content based hashing for profile lookup. Is there anything in FE PGO which makes it more tolerant to source drifts?
This is an important issue for us, but I don’t know if and how the frontend PGO is more tolerant to that. I think we need to do some more analysis on our end to see what specifically we need out of frontend PGO that isn’t well served by IR PGO.
As far the possible deprecation of frontend PGO, will that imply that the -fprofile-instr-generate / use options will get removed, or will they still be supported but will leverage IR PGO instead?
I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?
Here are the existing modes as I understand them:
Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.
Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.
IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.
Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.
Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.
Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?
Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.
Michael (just cc’d) and Alex L. have more context about the AppleClang release process and would be better suited to answer.
Historically, we’ve expected there to be some amount of source drift between the latest PGO/training build available from CI and the release tag. FE PGO is supposed to degrade gracefully when source drift occurs, and I believe we rely on that feature.
Both IR and Frontend PGO use content based hashing for profile lookup. Is there anything in FE PGO which makes it more tolerant to source drifts?
This is an important issue for us, but I don’t know if and how the frontend PGO is more tolerant to that. I think we need to do some more analysis on our end to see what specifically we need out of frontend PGO that isn’t well served by IR PGO.
As far the possible deprecation of frontend PGO, will that imply that the -fprofile-instr-generate / use options will get removed, or will they still be supported but will leverage IR PGO instead?
I assume it means making -fprofile-instr-generate and alias to -fprofile-generate which does IR PGO instrumentation.
-fprofile-instr-use and -fprofile-use are pretty much the same as of today as the compiler can tell if the profile is from LLVM or FE.
That makes sense to me, but we need to untangle the fact that -fprofile-instr-generate -fcoverage-mapping is currently used for coverage, so a simple alias isn’t quite correct.
I’ve always wanted a single, high-level coverage flag, and I always thought it should be spelled --coverage of -fcoverage, but that seems like it’s already taken by gcov instrumentation. =/ I guess we need to bikeshed a new spelling.
Right. -fcoverage-mapping itself does not much so it should probably imply frontend instrumentation.
For migration purposes, if -fcoverage-mapping is used together with -fprofile-instr-generate (which becomes IR PGO), the latter will be dropped (or a warning is given). The tricky part is if the user uses the option to specify the profile path, then we have a problem.
Bumping up this thread. Based on the initial investigation, I think we can switch to the IR PGO instead of the frontend PGO and so you’ll be able to proceed with this deprecation of the frontend PGO. We would like to request some additional time to do a full investigation and prepare for the transition on our end though, ideally we would need about 3 - 6 months to ensure we are prepared for that. Would you be willing to revisit this again in the future once we’re ready for that?
Sure, there’s no rush to deprecate frontend PGO. In the meantime, would it be OK to update the open source docs to recommend IR PGO over frontend PGO, without making any statement about deprecation? This is mainly to get any new PGO users onto what we think is currently the most well-lit path.
Sure, there’s no rush to deprecate frontend PGO. In the meantime, would it be OK to update the open source docs to recommend IR PGO over frontend PGO, without making any statement about deprecation? This is mainly to get any new PGO users onto what we think is currently the most well-lit path.