[ThinLTO] Importing based on PGO data

Hi,
I am working right now on importing based on PGO/FDO data. There is one issue that I found - when we calculate the list of imports, we can’t get the ProfileSummaryInfo, which is the best and I
think only valid way of checking if callsite/callee is hot (isHotCount()). There are 2 solutions that I come up with Teresa and Easwaran:

  1. Add PGO data to summary
  2. Replace CalleeInfo::ProfileCount with enum {None, Cold, Hot} computed during computing summary.

I like the 2. much more. It will reduce the summary size slightly and I don’t think we will need ProfileCount anywhere else.

The other thing I would like to mention is that I think we should start using the summary versioning and drop support of old version.
ThinLTO doesn’t have enough users right now and parsing many versions of summary will just add additional cost, that will start to grow.

Piotr

Hi,
I am working right now on importing based on PGO/FDO data. There is one
issue that I found - when we calculate the list of imports, we can't get the
ProfileSummaryInfo, which is the best and I
think only valid way of checking if callsite/callee is hot (isHotCount()).
There are 2 solutions that I come up with Teresa and Easwaran:

1. Add PGO data to summary
2. Replace CalleeInfo::ProfileCount with enum {None, Cold, Hot} computed
during computing summary.

Don't we already have edge profile count in the callgraph summary?
I think what is missing is the Profile SUmmary data itself -- that one
should be copied over to thinLTO summary so that the importing
analysis can use. However I we should not need to duplicate the
information in every module.

David

Hi,
I am working right now on importing based on PGO/FDO data. There is one
issue that I found - when we calculate the list of imports, we can't get
the ProfileSummaryInfo, which is the best and I
think only valid way of checking if callsite/callee is hot (isHotCount()).
There are 2 solutions that I come up with Teresa and Easwaran:

1. Add PGO data to summary

How much PGO data would need to be copied to summary?

2. Replace CalleeInfo::ProfileCount with enum {None, Cold, Hot} computed
during computing summary.

It depends on how binary the inlining decisions that (will) use this are -
do we want to give different thresholds to medium hot/cold vs very hot/cold?

I like the 2. much more. It will reduce the summary size slightly and I
don't think we will need ProfileCount anywhere else.

The other thing I would like to mention is that I think we should start
using the summary versioning and drop support of old version.
ThinLTO doesn't have enough users right now and parsing many versions of
summary will just add additional cost, that will start to grow.

It was released in 3.9 and in Xcode so it has external users and I believe
will need to support old versions going forward.

Teresa

> Hi,
> I am working right now on importing based on PGO/FDO data. There is one
> issue that I found - when we calculate the list of imports, we can't get
the
> ProfileSummaryInfo, which is the best and I
> think only valid way of checking if callsite/callee is hot
(isHotCount()).
> There are 2 solutions that I come up with Teresa and Easwaran:
>
> 1. Add PGO data to summary
> 2. Replace CalleeInfo::ProfileCount with enum {None, Cold, Hot} computed
> during computing summary.

Don't we already have edge profile count in the callgraph summary?
I think what is missing is the Profile SUmmary data itself -- that one
should be copied over to thinLTO summary so that the importing
analysis can use. However I we should not need to duplicate the
information in every module.

David

Yes we do have edge profile cout, but in order to compare it with global

couts we need Profile Summary as you said.
If we will follow 2) then we won't have to duplicate the data.

> Hi,
> I am working right now on importing based on PGO/FDO data. There is one
> issue that I found - when we calculate the list of imports, we can't get
> the
> ProfileSummaryInfo, which is the best and I
> think only valid way of checking if callsite/callee is hot
> (isHotCount()).
> There are 2 solutions that I come up with Teresa and Easwaran:
>
> 1. Add PGO data to summary
> 2. Replace CalleeInfo::ProfileCount with enum {None, Cold, Hot} computed
> during computing summary.

Don't we already have edge profile count in the callgraph summary?
I think what is missing is the Profile SUmmary data itself -- that one
should be copied over to thinLTO summary so that the importing
analysis can use. However I we should not need to duplicate the
information in every module.

David

Yes we do have edge profile cout, but in order to compare it with global
couts we need Profile Summary as you said.
If we will follow 2) then we won't have to duplicate the data.

Ok -- basically the profile summary is already consumed before
writing. If you go this route, I think you need more enum values for
fine tuning: for instance hot_10 --> 10 percentile hotness, ... hot_90
etc, and cold_99, cold_999 etc.

David

>
>
>>
>> > Hi,
>> > I am working right now on importing based on PGO/FDO data. There is
one
>> > issue that I found - when we calculate the list of imports, we can't
get
>> > the
>> > ProfileSummaryInfo, which is the best and I
>> > think only valid way of checking if callsite/callee is hot
>> > (isHotCount()).
>> > There are 2 solutions that I come up with Teresa and Easwaran:
>> >
>> > 1. Add PGO data to summary
>> > 2. Replace CalleeInfo::ProfileCount with enum {None, Cold, Hot}
computed
>> > during computing summary.
>>
>>
>> Don't we already have edge profile count in the callgraph summary?
>> I think what is missing is the Profile SUmmary data itself -- that one
>> should be copied over to thinLTO summary so that the importing
>> analysis can use. However I we should not need to duplicate the
>> information in every module.
>>
>> David
>>
> Yes we do have edge profile cout, but in order to compare it with global
> couts we need Profile Summary as you said.
> If we will follow 2) then we won't have to duplicate the data.

Ok -- basically the profile summary is already consumed before
writing. If you go this route, I think you need more enum values for
fine tuning: for instance hot_10 --> 10 percentile hotness, ... hot_90
etc, and cold_99, cold_999 etc.

Right, that relates to my question about how the inliner will eventually
use the same profile summary data, and whether it will be useful to
distinguish between various levels of hotness.

Teresa

Hi,
I am working right now on importing based on PGO/FDO data. There is one
issue that I found - when we calculate the list of imports, we can't get
the ProfileSummaryInfo, which is the best and I
think only valid way of checking if callsite/callee is hot
(isHotCount()). There are 2 solutions that I come up with Teresa and
Easwaran:

1. Add PGO data to summary

How much PGO data would need to be copied to summary?

AFAI got information form Easwaran - not much, about 3*15 integers.

2. Replace CalleeInfo::ProfileCount with enum {None, Cold, Hot} computed
during computing summary.

It depends on how binary the inlining decisions that (will) use this are -
do we want to give different thresholds to medium hot/cold vs very hot/cold?

We could also just later add MediumHot, MediumCold to the enum. But I am
not sure if we want to have that specific information. We can always mark
medium hot
as hot, and then maybe import more stuff, which should not be that harrmful.

I like the 2. much more. It will reduce the summary size slightly and I
don't think we will need ProfileCount anywhere else.

The other thing I would like to mention is that I think we should start
using the summary versioning and drop support of old version.
ThinLTO doesn't have enough users right now and parsing many versions of
summary will just add additional cost, that will start to grow.

It was released in 3.9 and in Xcode so it has external users and I believe
will need to support old versions going forward.

Is this really needed? I think it is important to be able to distinguish
between versions, so combining old summary with new one won't lead to some
weird behaviour and user will get an error. The user will maybe have to
recompile whole project, but on the other hand we won't spend
time on trying to figure out support for all the versions.

>
>
>>
>> > Hi,
>> > I am working right now on importing based on PGO/FDO data. There is
one
>> > issue that I found - when we calculate the list of imports, we can't
get
>> > the
>> > ProfileSummaryInfo, which is the best and I
>> > think only valid way of checking if callsite/callee is hot
>> > (isHotCount()).
>> > There are 2 solutions that I come up with Teresa and Easwaran:
>> >
>> > 1. Add PGO data to summary
>> > 2. Replace CalleeInfo::ProfileCount with enum {None, Cold, Hot}
computed
>> > during computing summary.
>>
>>
>> Don't we already have edge profile count in the callgraph summary?
>> I think what is missing is the Profile SUmmary data itself -- that one
>> should be copied over to thinLTO summary so that the importing
>> analysis can use. However I we should not need to duplicate the
>> information in every module.
>>
>> David
>>
> Yes we do have edge profile cout, but in order to compare it with global
> couts we need Profile Summary as you said.
> If we will follow 2) then we won't have to duplicate the data.

Ok -- basically the profile summary is already consumed before
writing. If you go this route, I think you need more enum values for
fine tuning: for instance hot_10 --> 10 percentile hotness, ... hot_90
etc, and cold_99, cold_999 etc.

Right, that relates to my question about how the inliner will eventually
use the same profile summary data, and whether it will be useful to
distinguish between various levels of hotness.

I don't know what is planned for the inliner. The Easwaran patch only uses
isHotCount right now, but I think that the data that David mentioned
should be better for future tunning ( and probably will take only a few
bits).

Piotr

IMO, multiple levels of hotness/coldness is not very useful. If that proves to be beneficial, tweaking the hotness cutoffs and the inline threshold for hot/cold sites should get to similar performance on average without the additional complexity.

2016-09-02 15:04 GMT-07:00 Xinliang David Li <davidxl@google.com>:

Hi,
I am working right now on importing based on PGO/FDO data. There is one
issue that I found - when we calculate the list of imports, we can’t get the
ProfileSummaryInfo, which is the best and I
think only valid way of checking if callsite/callee is hot (isHotCount()).
There are 2 solutions that I come up with Teresa and Easwaran:

  1. Add PGO data to summary
  2. Replace CalleeInfo::ProfileCount with enum {None, Cold, Hot} computed
    during computing summary.

Don’t we already have edge profile count in the callgraph summary?
I think what is missing is the Profile SUmmary data itself – that one
should be copied over to thinLTO summary so that the importing
analysis can use. However I we should not need to duplicate the
information in every module.

David

Yes we do have edge profile cout, but in order to compare it with global couts we need Profile Summary as you said.

Can you explain a bit more the issue here?

Thanks,

Mehid

IMO, multiple levels of hotness/coldness is not very useful. If that
proves to be beneficial, tweaking the hotness cutoffs and the inline
threshold for hot/cold sites should get to similar performance on average
without the additional complexity.

I think it can be useful -- even though the inliner's implementation may
choose to not use the information.

Suppose we have compile time/import size budget, having more fine grained
information helps the importer prioritize the callsites based on that
information -- similar to priority based inliner. Besides I don't see how
it adds more complexity.

David

The complexity I was referring to is in the inliner is having additional cutoffs and ranges. I agree it is not much, if there is benefit. Without any data, it is hard to argue one is better than other.

The profile summary is saved in the global metadata ASAIK. If we want to calculate if something is hot/cold while choosing functions for importing, we would either need to read whole Module (which we clearly don’t want to do)
or duplicate this information in the summary, so we could get it without reading Module.

When I say “explain a bit more” I meant:

  • what is the profile summary?
  • how do you compute the hot/cold information from it and from the count we attache to edges?

The profile summary is saved in the global metadata ASAIK. If we want to
calculate if something is hot/cold while choosing functions for importing,
we would either need to read whole Module (which we clearly don't want to
do)
or duplicate this information in the summary, so we could get it without
reading Module.

When I say “explain a bit more” I meant:

Sure, I didn't know which part is not clear:

- what is the profile summary?

Profile summary is additional metadata in LLVM calculated from FDO/aFDO
data.
It is basically ProfileSummary class, having vector of ProfileSummaryEntry.

- how do you compute the hot/cold information from it and from the count
we attache to edges?

Having the ProfileCount from CalleeInfo, we can know what is the hotness

of callsite (of the block).
- or at least the average hotness of basic blocks that calls give callee.
(We probably want to change it
to std::max of callsite hotness later)
Then having this information we can call ProfileSummaryInfo::isHotCount()
to find out if the callsite is hot,
which would probably mean that we want to inline callee to this basic block.
Right now Inliner only cares about the callee hotness, not callsite
hotness, but this will change when
we will have new pass manager (the Easwaran patch depends on it).

Is that enough information? Fell free to ask some more, or ping me directly
Mehdi.

Piotr

The profile summary is saved in the global metadata ASAIK. If we want to
calculate if something is hot/cold while choosing functions for importing,
we would either need to read whole Module (which we clearly don't want to
do)
or duplicate this information in the summary, so we could get it without
reading Module.

When I say “explain a bit more” I meant:

Sure, I didn't know which part is not clear:

- what is the profile summary?

Profile summary is additional metadata in LLVM calculated from FDO/aFDO
data.
It is basically ProfileSummary class, having vector of ProfileSummaryEntry.

- how do you compute the hot/cold information from it and from the count
we attache to edges?

Having the ProfileCount from CalleeInfo, we can know what is the hotness

of callsite (of the block).
- or at least the average hotness of basic blocks that calls give callee.
(We probably want to change it
to std::max of callsite hotness later)

Sorry, this problem will go away if we get rid of the ProfileCount from
CalleeInfo.