Status of IR vs. frontend PGO (fprofile-generate vs fprofile-instr-generate)

rnk · May 19, 2021, 8:18pm

Hi folks,

I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?

Here are the existing modes as I understand them:

Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.
Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.
IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.
Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.

Alex_Lorenz · May 19, 2021, 8:31pm

Hi,

Hi folks,

I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?

Here are the existing modes as I understand them:

1. Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.

2. Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.

3. IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the "pre-inliner" pass and other simplification passes so that there are fewer counter updates and a bit more precision.

4. Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a "second round" of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.

---

Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.

Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?

rnk · May 19, 2021, 8:55pm

Basically what happened is that Chrome started using frontend PGO, and we found that binary patch updates were too large: https://crbug.com/1082307 I came across these Android docs mentioning IR PGO, and suggested we try that: https://source.android.com/devices/tech/perf/pgo Switching to IR PGO fixed the binary patch size issues and got better scores on Speedometer (1.17x improvement over non-PGO with IR PGO vs. 1.11x improvement with frontend PGO), so we never looked back. We consulted the Google optimization team folks, and they recommended we use IR PGO for performance. That’s where they are focusing all of their efforts.

Essentially, I’d like to figure out which mode is likely to be the best supported going forward, and document that.

Xinliang_David_Li · May 19, 2021, 10:11pm

Hi,

Hi folks,

I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?

Here are the existing modes as I understand them:

Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.

Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.

IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.

Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.

Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.

Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?

+Vedant Kumar

Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.

Deprecating frontend-PGO (and making it for coverage testing only) would be desirable as Reid said.

David

vedantk · May 19, 2021, 11:14pm

Hi,

Hi folks,

I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?

Here are the existing modes as I understand them:

Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.

Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.

IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.

Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.

Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.

Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?

+Vedant Kumar

Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.

Michael (just cc’d) and Alex L. have more context about the AppleClang release process and would be better suited to answer.

Historically, we’ve expected there to be some amount of source drift between the latest PGO/training build available from CI and the release tag. FE PGO is supposed to degrade gracefully when source drift occurs, and I believe we rely on that feature.

vedant

Xinliang_David_Li · May 19, 2021, 11:24pm

Hi,

Hi folks,

I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?

Here are the existing modes as I understand them:

Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.

Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.

IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.

Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.

Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.

Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?

+Vedant Kumar

Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.

Michael (just cc’d) and Alex L. have more context about the AppleClang release process and would be better suited to answer.

Historically, we’ve expected there to be some amount of source drift between the latest PGO/training build available from CI and the release tag. FE PGO is supposed to degrade gracefully when source drift occurs, and I believe we rely on that feature.

Both IR and Frontend PGO use content based hashing for profile lookup. Is there anything in FE PGO which makes it more tolerant to source drifts?

David

Alex_Lorenz · May 20, 2021, 2:48am

Hi,

Hi folks,

I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?

Here are the existing modes as I understand them:

Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.

Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.

IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.

Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.

Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.

Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?

+Vedant Kumar

Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.

Michael (just cc’d) and Alex L. have more context about the AppleClang release process and would be better suited to answer.

Historically, we’ve expected there to be some amount of source drift between the latest PGO/training build available from CI and the release tag. FE PGO is supposed to degrade gracefully when source drift occurs, and I believe we rely on that feature.

Both IR and Frontend PGO use content based hashing for profile lookup. Is there anything in FE PGO which makes it more tolerant to source drifts?

This is an important issue for us, but I don’t know if and how the frontend PGO is more tolerant to that. I think we need to do some more analysis on our end to see what specifically we need out of frontend PGO that isn’t well served by IR PGO.

As far the possible deprecation of frontend PGO, will that imply that the -fprofile-instr-generate / use options will get removed, or will they still be supported but will leverage IR PGO instead?

Xinliang_David_Li · May 20, 2021, 3:21am

Hi,

Hi folks,

I wanted to make some improvements to code coverage, which uses frontend profile instrumentation. Is anyone still using frontend PGO for optimization (not coverage), or has everyone moved to IR PGO for that?

Here are the existing modes as I understand them:

Frontend PGO: -fprofile-instr-generate / use. Code in clang/lib/CodeGen/CodeGenPGO.cpp inserts PGO counter update intrinsics. This happens before optimization. This is very source directed.

Coverage: -fprofile-instr-generate -fcoverage-mapping. This is basically frontend PGO, plus some extra coverage mapping data to map from counters back to precise source locations.

IR PGO: -fprofile-generate. The LLVM IR pass llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp is responsible for inserting calls to the counter update intrinsics at some specific point in the optimization pipeline. IIUC, this is done after the “pre-inliner” pass and other simplification passes so that there are fewer counter updates and a bit more precision.

Context sensitive PGO: -fcs-profile-generate. This is basically the same as IR PGO, except it happens after regular inlining, so you can use it as a “second round” of PGO: use IR PGO first, get profile guided inlinings, profile again, and use that to influence code layout or further inlining.

Is anyone using the first mode, frontend PGO, or has everyone migrated to IR PGO already? It seems to me that the main use case for frontend PGO is really just coverage, and we should consider deprecating frontend PGO. In Chrome, we accidentally started using frontend PGO instead of IR PGO, and found that IR PGO was better.

Yes, we use the frontend PGO in Apple clang in Xcode. I’m curious though, what kind of improvements did you see with IR PGO?

+Vedant Kumar

Any other reason you are using frontend PGO for performance? IR PGO has other advantages as well: 1) better performance in training run; 2) better value profiling support; and 3) smaller raw profile data size.

Michael (just cc’d) and Alex L. have more context about the AppleClang release process and would be better suited to answer.

Historically, we’ve expected there to be some amount of source drift between the latest PGO/training build available from CI and the release tag. FE PGO is supposed to degrade gracefully when source drift occurs, and I believe we rely on that feature.

Both IR and Frontend PGO use content based hashing for profile lookup. Is there anything in FE PGO which makes it more tolerant to source drifts?

This is an important issue for us, but I don’t know if and how the frontend PGO is more tolerant to that. I think we need to do some more analysis on our end to see what specifically we need out of frontend PGO that isn’t well served by IR PGO.

As far the possible deprecation of frontend PGO, will that imply that the -fprofile-instr-generate / use options will get removed, or will they still be supported but will leverage IR PGO instead?

I assume it means making -fprofile-instr-generate and alias to -fprofile-generate which does IR PGO instrumentation.

-fprofile-instr-use and -fprofile-use are pretty much the same as of today as the compiler can tell if the profile is from LLVM or FE.

David

rnk · May 20, 2021, 4:47pm

That makes sense to me, but we need to untangle the fact that -fprofile-instr-generate -fcoverage-mapping is currently used for coverage, so a simple alias isn’t quite correct.

I’ve always wanted a single, high-level coverage flag, and I always thought it should be spelled --coverage of -fcoverage, but that seems like it’s already taken by gcov instrumentation. =/ I guess we need to bikeshed a new spelling.

Xinliang_David_Li · May 20, 2021, 4:54pm

Right. -fcoverage-mapping itself does not much so it should probably imply frontend instrumentation.

For migration purposes, if -fcoverage-mapping is used together with -fprofile-instr-generate (which becomes IR PGO), the latter will be dropped (or a warning is given). The tricky part is if the user uses the option to specify the profile path, then we have a problem.

David

Alex_Lorenz · June 14, 2021, 5:31pm

Bumping up this thread. Based on the initial investigation, I think we can switch to the IR PGO instead of the frontend PGO and so you’ll be able to proceed with this deprecation of the frontend PGO. We would like to request some additional time to do a full investigation and prepare for the transition on our end though, ideally we would need about 3 - 6 months to ensure we are prepared for that. Would you be willing to revisit this again in the future once we’re ready for that?

Thanks,
Alex

rnk · June 14, 2021, 6:25pm

Sure, there’s no rush to deprecate frontend PGO. In the meantime, would it be OK to update the open source docs to recommend IR PGO over frontend PGO, without making any statement about deprecation? This is mainly to get any new PGO users onto what we think is currently the most well-lit path.

Xinliang_David_Li · June 14, 2021, 6:56pm

Sounds good to me.

David

Alex_Lorenz · June 15, 2021, 11:26pm

Sure, there’s no rush to deprecate frontend PGO. In the meantime, would it be OK to update the open source docs to recommend IR PGO over frontend PGO, without making any statement about deprecation? This is mainly to get any new PGO users onto what we think is currently the most well-lit path.

I think that’s fine, no objections from us.

Topic		Replies	Views
Clang PGO and LLVM PGO commandline interface Clang Frontend	3	195	January 4, 2018
Profile-Guided Optimization (PGO) related questions and suggestions LLVM Project pgo	24	1595	December 20, 2023
Couple of general questions about PGO IR & Optimizations pgo , llvm	2	671	July 27, 2023
How to use PGO with Clang Clang Frontend	0	409	August 22, 2022
Proposal: add instrumentation for PGO and code coverage Clang Frontend	11	547	September 10, 2013

Status of IR vs. frontend PGO (fprofile-generate vs fprofile-instr-generate)

Related topics