split hot and cold part of a function into separate function

user35 · May 6, 2021, 9:11am

currently, gcc support function attribute “cold”, which can hint compiler split caller function’s cold into two separate function, one is hot the other is cold.

One example is here: https://godbolt.org/z/j7sK4hd48

my question is Clang/llvm has such function/capability ?

ChuanqiXu · May 6, 2021, 9:16am

Hi,

IIRC, clang/llvm has HotColdSplit and partial inline passes which has similar functionality. However, these two passes are not enabled by default for some reasons.

Thanks,
Chuanqi

user35 · May 6, 2021, 9:49am

are there some example ?
And this option -hot-cold-split need profile data ?

ChuanqiXu · May 6, 2021, 11:11am

Now if we use --hot-cold-split, the compiler would tell us unsupported option. So I can’t find simple example from compiler explorer.
Here is the slides from the web: https://llvm.org/devmtg/2019-10/slides/Kumar-HotColdSplitting.pdf
BTW, may I ask what’s the intention for the question? Do we find that it is a performance gap from clang and gcc?

Thanks,
Chuanqi

user35 · May 7, 2021, 2:42am

Because lots of code in a function was error/failure-handle code. The hot path of a function mostly very thin.

By use hot-cold-split, the hot part can keep more local and cluster.

Which help cpu i-cache hit.

I-cache miss hurt performance significantly.

rnk · May 7, 2021, 8:08pm

I believe intra-function hot cold code splitting is in the scope of the Propeller project, which Sriram Tallam worked on. I’m not sure what the status of the feature is at this moment.

I believe that the hot cold split pass is an IR pass, which means that it outlines code at the IR level. This will prevent the register allocator from working across the boundary between hot and cold code, so I don’t believe it has as much performance potential as splitting the function during code generation. Looking at the example, I believe GCC is using this strategy, it is not calling outlined code.

tmsriram · May 8, 2021, 1:19am

I believe intra-function hot cold code splitting is in the scope of the Propeller project, which Sriram Tallam worked on. I’m not sure what the status of the feature is at this moment.

This is available in LLVM with option -fsplit-machine-functions with PGO and it uses PGO profiles to split a function’s cold basic blocks which can then be placed arbitrarily. It is tested on instrumented PGO where it shows gains of a couple of percent. With Sampled PGO, we are still working on tuning the split.

We have also added support for Propeller, which uses another round of profiling to precisely layout basic blocks and split functions. While this is more effective that -fsplit-machine-functions, it requires another round of sampled profiling. Please see the documentation here to optimize binaries with Propeller: : https://github.com/google/autofdo/blob/propeller/OptimizeClangO3WithPropeller.md

Both -fsplit-machine-functions and Propeller use the basic block sections feature to perform function splitting.

I believe that the hot cold split pass is an IR pass, which means that it outlines code at the IR level. This will prevent the register allocator from working across the boundary between hot and cold code, so I don’t believe it has as much performance potential as splitting the function during code generation. Looking at the example, I believe GCC is using this strategy, it is not calling outlined code.

Yep, GCC too splits functions just like -fsplit-machine-functions during code generation and not early like hot cold splitting. For performance, we have found this more effective.

user35 · May 8, 2021, 5:19am

can we do not depends on p profiling？

use hint just like: [unlikely]

Which make more controllable for programmers

发自我的iPhone

tmsriram · May 8, 2021, 7:39pm

can we do not depends on p profiling？

use hint just like: [unlikely]

Could you please rephrase? I think you mean if we could split without profile information? We don’t support that right now and we haven’t had much success with that with regards to performance. If you are not using profile-guided builds, you are dropping a lot of performance anyways.

user35 · May 9, 2021, 11:53am

2021年5月9日上午3:39，Sriraman Tallam <tmsriram@google.com> 写道：

can we do not depends on p profiling？

use hint just like: [unlikely]

Could you please rephrase? I think you mean if we could split without profile information? We don’t support that right now and we haven’t had much success with that with regards to performance. If you are not using profile-guided builds, you are dropping a lot of performance anyways.

Yes. Because profile need choice perfect workload, and it maybe hard or infeasible for some software. Which has many workload need to support well and need balance between these workload.

Topic		Replies	Views
Status update on the hot/cold splitting pass LLVM Dev List Archives	9	157	February 6, 2019
[RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data LLVM Dev List Archives	20	494	August 15, 2020
Improve hot cold splitting to aggressively outline small blocks LLVM Dev List Archives	7	97	June 19, 2020
Function splitting LLVM Dev List Archives	1	88	September 8, 2014
[RFC] Add "hot" function attribute to LLVM IR and use hot/cold attribute in function section prefix LLVM Dev List Archives	0	304	December 4, 2020

split hot and cold part of a function into separate function

Related topics