Hello, everyone!
We (at Sony) have recently observed some stack size increases that are
largely due to more aggressive inlining. During the investigation we
found that clang/llvm does not seem to consider stack size in its inlining
decisions. In the following contrived example
// big_stack.c
#ifndef SVAR_SIZE_K
#define SVAR_SIZE_K (2048)
#endif
extern void init(int *a);
static int foo(int x) {
int a[SVAR_SIZE_K * 1024];
init(a);
return a[x];
}
int bar(int x, int y) {
if (x < y) {
return foo(x);
} else {
return 0;
}
}
// ----------------
foo() is inlined into bar() no matter how large you make SVAR_K_SIZE, even
though the call too foo is not in straightline code and the compiler has no
idea (at least in the absence of any profiling data) whether foo() is
called often, rarely, ot not at all.
Gcc, on the other hand, supports a couple of flags that influence inlining behavior wrt
stack size:
--param large-stack-frame=<N>
--param large-stack-frame-growth=<P>
If you use the first option, a caller’s stack size cannot grow beyond N bytes, although
the documentation seems to imply that there is a little bit of leeway. 256 bytes seems to
be the default on Linux.
Using the second option limits a stack frame’s growth due to inlining to P percent
of the original stack size. So --param large-stack-frame-growth=2000 means the stack
can’t grow by more than 20 times. The default appears to be 1000, i.e. 10 times the
original stack size. If large-stack-frame is also given, it takes preference over
large-stack-frame-growth.
I’d like to first understand if I missed something in the code that already does what I’m looking for, and if not, solicit some opinions on whether we should
-
mimic gcc’s behavior by adding similar options and thresholds to potentially
suppress inlining of functions with large stack sizes. -
perhaps go further and introduce a more fine-grained approach, where stack size
and other factors such as call hotness are considered in the inlining decision.Some points to consider would be:
- The stack size of the inlined function directly contributes to the inlining
cost. - The relative call hotness contributes inversely to the inlining cost, i.e.
the more often a function is called, the less it matters whether its stack
is allocated as part of its caller’s stack, since its going to be called
often anyway. The extreme case is when the function is called in straightline
code. The stack size should then not matter for the inlining decision.
So roughly, C = max(S - T, 0)/H, with C being the inlining cost contribution,
S being the stack size, T a threshold below which we don’t consider the stack
size at all (could also be a percentage of the caller’s stack size), and H a hotness
measure. - The stack size of the inlined function directly contributes to the inlining
Thanks,
Wolfgang Pieb
Sony Interactive Entertainment (SIE)