[RFC] Adding target-specific overrides for Indirect Call Promotion

Hi,
We see improved performance on the PowerPC platform by increasing the aggressiveness of Indirect Call Promotion (ICP).
In particular, lowering the promotion threshold and increasing the maximum number of promotions helps.
The following command line options (default values shown) control some of the ICP parameters:
-icp-max-annotations=3
-icp-max-prom=3
-icp-remaining-percent-threshold=30
We would like to change their defaults to a target specific value.

I have few questions:

  1. Is anyone else interested in having target specific default values for the above options?
  2. Is anyone against making the defaults for the above options target dependent?
  3. If I were to make the default values target dependent (but allow user specified option to trump the defaults) the question is whether the following is the best and simplest way to do it:
  • teach TargetTransformInfo (TTI) about the above 3 values (basically add 3 integer-returning query functions).
  • make PGOinstrumentationUse, PGOIndirectCallPromotion, and ModuleSummaryIndexAnalysis passes require the TargetIRAnalysis pass so that they can access the TTI instance, and pass it to ICallPromotionAnalysis.
    The legacy PM passes would be changed symmetrically.

Thank you.

Wael Yehia
Compiler Development
IBM Canada Lab
wyehia@ca.ibm.com

Hi, Wael,

Am I correct in assuming that the reason it makes sense for these to be target dependent is because there’s on the dependence on the relative cost of checks vs. the indirect-call overhead? There’s also a different from other inlining benefits, but I imagine that’s more-weakly target dependent.

To what would you like to change the max-annotations value? Should this always match the max-prom value?

-Hal

Hi Hal,
Yes, we benchmarked the cost of checks vs indirect-call overhead, and ran SPEC and some internal suites with the chosen parameters.
We found that in the worst case (when inlining of the specialized calls is disabled) we see equal or better performance
when doing up to 7 specializations and when the specialization targets have at least 6-8% frequency
compared to a single indirect call.
I can share the benchmark if anyone is interested.

Given that the current algorithm in ICP applies the threshold on the frequency compared to the “remaining” values,
we set our threshold (-icp-remaining-percent-threshold) to 10 (so the minimum frequencies for 7 candidates will be: 10%, 9%, 8%, 7%, 7%, 6%, 5%).

And yes, we would need the max-annotations to match the max-prom values.
So the PowerPC values would be (might differ for subtargets):

-icp-max-annotations=7
-icp-max-prom=7
-icp-remaining-percent-threshold=10

Wael Yehia
Compiler Development
IBM Canada Lab
wyehia@ca.ibm.com

-----“Finkel, Hal J.” <hfinkel@anl.gov> wrote: -----

Hi Hal,
Yes, we benchmarked the cost of checks vs indirect-call overhead, and ran SPEC and some internal suites with the chosen parameters.
We found that in the worst case (when inlining of the specialized calls is disabled) we see equal or better performance
when doing up to 7 specializations and when the specialization targets have at least 6-8% frequency
compared to a single indirect call.

Interesting. I recommend posting some TTI patches and we can go from there.

I can share the benchmark if anyone is interested.

That sounds useful. Perhaps people can use it to run tests on other targets.

-Hal

Given that the current algorithm in ICP applies the threshold on the frequency compared to the "remaining" values,
we set our threshold (-icp-remaining-percent-threshold) to 10 (so the minimum frequencies for 7 candidates will be: 10%, 9%, 8%, 7%, 7%, 6%, 5%).

And yes, we would need the max-annotations to match the max-prom values.
So the PowerPC values would be (might differ for subtargets):
  -icp-max-annotations=7
  -icp-max-prom=7
  -icp-remaining-percent-threshold=10

Wael Yehia
Compiler Development
IBM Canada Lab
wyehia@ca.ibm.com<mailto:wyehia@ca.ibm.com>

-----"Finkel, Hal J." <hfinkel@anl.gov<mailto:hfinkel@anl.gov>> wrote: -----