I collected profiles using the Instrumented method and applied PGO for the entire program. Now, I want to apply PGO for specific functions. Can you help me understand how to do the function-level PGO?
Can you elaborate on what you’re trying to do? It sounds like you only want to selectively apply a PGO profile to certain functions, is that right?
I don’t think we have great facilities for fine-grained management like that, but the llvm-profdata
tool has some filtering mechanisms that could work for you. llvm-profdata - Profile data tool — LLVM 20.0.0git documentation descibes the --function=
option, which takes a regex. I haven’t used it, but typically I find filtering like that difficult to use on big projects.
There’s also some file level filtering facilitates in the llvm-profdata
tool (or maybe in clang? I can’t recall).
Also, you may have better luck under a different category, like LLVM Project
, or IR & Optimization
.
Thanks for your response.
First, I applied PGO for the entire program and measured performance. Now, I have to identify the functions that have a greater impact on performance(like hot functions in a program). I want to apply PGO for those hot functions while keeping other functions intact. Now, again, I want to measure performance. So, how can I apply PGO for those functions?
On Fri, Nov 15, 2024 at 4:53 PM Paul Kirth via LLVM Discussion Forums <notifications@llvm.discoursemail.com> wrote:
ilovepi
November 15Can you elaborate on what you’re trying to do? It sounds like you only want to selectively apply a PGO profile to certain functions, is that right?
I don’t think we have great facilities for fine-grained management like that, but the
llvm-profdata
tool has some filtering mechanisms that could work for you. llvm-profdata - Profile data tool — LLVM 20.0.0git documentation descibes the--function=
option, which takes a regex. I haven’t used it, but typically I find filtering like that difficult to use on big projects.There’s also some file level filtering facilitates in the
llvm-profdata
tool (or maybe in clang? I can’t recall).Also, you may have better luck under a different category, like
LLVM Project
, orIR & Optimization
.
Visit Topic or reply to this email to respond.
To unsubscribe from these emails, click here.
Typically we suggest you collect a PGO profile with a set of representative workloads for your application as a training corpus (e.g. for generating the profile), and using that to optimize the entire application. The optimizer usually does a good(or at least reasonable) job of deciding how to use the profile data to optimize hot code without overly pessimizing things that aren’t hot.
As an example you can get a fairly significant speedup for clang by using just the basic Hello World corpus. There’s some information about that specifically here: Advanced Build Configurations — LLVM 20.0.0git documentation. Its not as valuable as using a carefully selected corpus, but its still better than not using PGO at all.
But if you’re dead set on only using profiling data for functions you choose, then I think the --filter=
method I linked to is one of the few good options you have.
Thanks for the suggestions.
I want to apply PGO to a few specific functions in a program. You suggested using --filter=
method’. Can you give me an example of how to use ‘–filter’ or send me the link?
Sorry, I think I meant --function=
, which I’ve linked above.
Roughly, I think you either have to use some of the filtering mechanisms in clang
, like -fprofile-list
, to only collect profiles from some specific functions in the first place, or else when you’re merging the profile w/ llvm-profdata
use --function=
to remove anything from the profile that doesn’t match the regex.
I haven’t used either facility, so your milage may vary. The --help
and --help-hidden
are often useful for our CLI tools, like clang
and llvm-profdata
, so you may want to check those out an experiment to see if one of the other options is useful in your context.
Thanks for your help .
I am thinking in another way. If I apply PGO for the entire program and then apply O3 for the entire program. Can I replace a particular optimized function with the O3-optimized binary during the linking phase? I don’t understand how I can replace a function from one binary with another.
PGO optimized function will be placed at the O3 optimized binary.
Any help would be appreciated.
On Mon, Nov 18, 2024 at 3:19 PM Paul Kirth via LLVM Discussion Forums <notifications@llvm.discoursemail.com> wrote:
ilovepi
November 18Sorry, I think I meant
--function=
, which I’ve linked above.Roughly, I think you either have to use some of the filtering mechanisms in
clang
, like-fprofile-list
, to only collect profiles from some specific functions in the first place, or else when you’re merging the profile w/llvm-profdata
use--function=
to remove anything from the profile that doesn’t match the regex.I haven’t used either facility, so your milage may vary. The
--help
and--help-hidden
are often useful for our CLI tools, likeclang
andllvm-profdata
, so you may want to check those out an experiment to see if one of the other options is useful in your context.
Visit Topic or reply to this email to respond.
To unsubscribe from these emails, click here.
Short of using llvm-objcopy
, I’m not aware of any way to do what you’re proposing. Even then I wouldn’t be surprised if things broke in a spectacular fashion. I’m not an expert here though, so take that with a grain of salt. Object file manipulation seems like a long way to go to just avoid either re-merging the profile or rebuilding and recollecting the profile data.
What I’ve pointed at are 2 different ways to do what you want. One way uses clang
to only instrument/collect profiles for the functions you care about. The other allows you to take the profiles you’ve already collected and filter them down to only contain profile data for those functions. That’s what you’ve asked for, unless I’ve misunderstood.
I still think that unless you have compelling reasons to do something off the beaten path, I’d recommend you just optimize the entire application with the profile you’ve collected, as this typically gives the best results. Since thats the way most of us use profiling(unless they’re collecting samples from production), I’d expect you to have an easier time making that workflow work for you than what you’re proposing.
Thanks for your response .
My work is to see the impact of PGOs at the function level, so I would like to apply PGOs to the specific functions of the entire program. Then I need to see the performance.
So, -fprofile-list is the LLVM command that can be used to get the profile for specific functions. The filtering method might not work as it works on regex. If there is no particular pattern for the function names, getting a few functions would not be easy.
On Mon, Nov 18, 2024 at 5:21 PM Paul Kirth via LLVM Discussion Forums <notifications@llvm.discoursemail.com> wrote:
ilovepi
November 18
soma_p:
I am thinking in another way. If I apply PGO for the entire program and then apply O3 for the entire program. Can I replace a particular optimized function with the O3-optimized binary during the linking phase? I don’t understand how I can replace a function from one binary with another.
PGO optimized function will be placed at the O3 optimized binary.
Any help would be appreciated.
Short of using
llvm-objcopy
, I’m not aware of any way to do what you’re proposing. Even then I wouldn’t be surprised if things broke in a spectacular fashion. I’m not an expert here though, so take that with a grain of salt. Object file manipulation seems like a long way to go to just avoid either re-merging the profile or rebuilding and recollecting the profile data.What I’ve pointed at are 2 different ways to do what you want. One way uses
clang
to only instrument/collect profiles for the functions you care about. The other allows you to take the profiles you’ve already collected and filter them down to only contain profile data for those functions. That’s what you’ve asked for, unless I’ve misunderstood.I still think that unless you have compelling reasons to do something off the beaten path, I’d recommend you just optimize the entire application with the profile you’ve collected, as this typically gives the best results. Since thats the way most of us use profiling(unless they’re collecting samples from production), I’d expect you to have an easier time making that workflow work for you than what you’re proposing.
Visit Topic or reply to this email to respond.
To unsubscribe from these emails, click here.