I'm trying to generate profile data for clang by building the some of the packages
we ship in Fedora Linux. I'm trying to decide how many packages to build, is
there much advantage to building 1000 vs something substantially less, like 100?
It is not the number of packages that matter, but the coverage of the hot paths that matters. For instance if you build a lot of packages but they are all written in C, the profile will miss many paths related to C++ such as template handling. The options used in the package builds are also important.
To curate the training data, one way is to find the most time consuming part of your build and select those. The profile merge tool also support merging with weights.
Honestly I’m not actually convinced there’s that much difference between carefully selected and curated collections of PGO data, and building a few “Hello World” type simple programs.
When I was working on Clang at Apple much of our instrumentation showed that process launch time was the most consistent place that we could optimize performance to get significant wins that were pretty universal.
When I added the in-tree multi-stage PGO that used LIT to run instrumented compiles, I found that just the one C++ hello-world program had something crazy like a 6% performance improvement. I’d love to see us add a few more source files into that system so that we could tune it a bit, but I never had the time.
Honestly I'm not actually convinced there's that much difference between carefully selected and curated collections of PGO data, and building a few "Hello World" type simple programs.
When I was working on Clang at Apple much of our instrumentation showed that process launch time was the most consistent place that we could optimize performance to get significant wins that were pretty universal.
When I added the in-tree multi-stage PGO that used LIT to run instrumented compiles, I found that just the one C++ hello-world program had something crazy like a 6% performance improvement. I'd love to see us add a few more source files into that system so that we could tune it a bit, but I never had the time.
-Chris
Yes, the initial enablement (even with a hello-world program) can give
decent speed up.
After that, training llvm-project itself, or other dedicated
applications has little marginal benefit.
So I'd just pick one medium-sized C and one C++ applications for training data.
If distributors thinks adding more training data is easy, adding up to
10 applications still looks goo to me.
100 or 1000 are definitely too much and don't worth the hassle
yes it really depends on the point of ‘diminishing’ returns that is determined. Some may think 6% to 10% performance improvement with 2x coverage is not worth it (or not having perceivable impact on users), but others may think an additional 0.5% is worth the effort even with 5x more training due to power or cpu savings :). This depends on the type of apps and the scale of the deployment of the optimized product.
Hans’s solution to this problem when we deployed PGO for Chrome was to pick a representative C++ file from the codebase, pre-process it, and use that as an input during PGO training. This fails to cover input reading and use cases like LTO, but it at least ensures that all the Sema, optimizer, and codegen codepaths are representatively exercised.
Honestly I'm not actually convinced there's that much difference between carefully selected and curated collections of PGO data, and building a few "Hello World" type simple programs.
When I was working on Clang at Apple much of our instrumentation showed that process launch time was the most consistent place that we could optimize performance to get significant wins that were pretty universal.
When I added the in-tree multi-stage PGO that used LIT to run instrumented compiles, I found that just the one C++ hello-world program had something crazy like a 6% performance improvement. I'd love to see us add a few more source files into that system so that we could tune it a bit, but I never had the time.
-Chris
Yes, the initial enablement (even with a hello-world program) can give
decent speed up.
After that, training llvm-project itself, or other dedicated
applications has little marginal benefit.
So I'd just pick one medium-sized C and one C++ applications for training data.
If distributors thinks adding more training data is easy, adding up to
10 applications still looks goo to me.
100 or 1000 are definitely too much and don't worth the hassle
Thanks for the feedback everyone. I've decided to start by collecting
profile data from a C (glib2), C++ (libabigail), and rust (rust-ripgrep)
application. I'll see how much improvement I can get from this.