Impact of stack size on inlining decisions

wolfy1961 · May 12, 2022, 11:09pm

Hello, everyone!

We (at Sony) have recently observed some stack size increases that are
largely due to more aggressive inlining. During the investigation we
found that clang/llvm does not seem to consider stack size in its inlining
decisions. In the following contrived example

// big_stack.c
#ifndef SVAR_SIZE_K                                         
#define SVAR_SIZE_K (2048)                                  
#endif                                                    
                                                          
extern void init(int *a);                                
static int foo(int x) {                                   
  int a[SVAR_SIZE_K * 1024];
  init(a);
  return a[x];                                            
}                                                         
                                                          
int bar(int x, int y) {                                   
  if (x < y) {                                            
    return foo(x);
  } else {                                                
    return 0;                                             
  }                                                       
}                                                         
// ----------------

foo() is inlined into bar() no matter how large you make SVAR_K_SIZE, even
though the call too foo is not in straightline code and the compiler has no
idea (at least in the absence of any profiling data) whether foo() is
called often, rarely, ot not at all.

Gcc, on the other hand, supports a couple of flags that influence inlining behavior wrt
stack size:

--param large-stack-frame=<N>
--param large-stack-frame-growth=<P>

If you use the first option, a caller’s stack size cannot grow beyond N bytes, although
the documentation seems to imply that there is a little bit of leeway. 256 bytes seems to
be the default on Linux.

Using the second option limits a stack frame’s growth due to inlining to P percent
of the original stack size. So --param large-stack-frame-growth=2000 means the stack
can’t grow by more than 20 times. The default appears to be 1000, i.e. 10 times the
original stack size. If large-stack-frame is also given, it takes preference over
large-stack-frame-growth.

I’d like to first understand if I missed something in the code that already does what I’m looking for, and if not, solicit some opinions on whether we should

mimic gcc’s behavior by adding similar options and thresholds to potentially
suppress inlining of functions with large stack sizes.
perhaps go further and introduce a more fine-grained approach, where stack size
and other factors such as call hotness are considered in the inlining decision.

Some points to consider would be:
- The stack size of the inlined function directly contributes to the inlining
  cost.
- The relative call hotness contributes inversely to the inlining cost, i.e.
  the more often a function is called, the less it matters whether its stack
  is allocated as part of its caller’s stack, since its going to be called
  often anyway. The extreme case is when the function is called in straightline
  code. The stack size should then not matter for the inlining decision.
So roughly, C = max(S - T, 0)/H, with C being the inlining cost contribution,
S being the stack size, T a threshold below which we don’t consider the stack
size at all (could also be a percentage of the caller’s stack size), and H a hotness
measure.

Thanks,
Wolfgang Pieb
Sony Interactive Entertainment (SIE)

aeubanks · May 12, 2022, 11:37pm

https://github.com/llvm/llvm-project/blob/e91a73de24d60954700d7ac0293c050ab2cbe90b/llvm/lib/Analysis/InlineCost.cpp#L1357

There’s a little bit of logic around inlining large allocas, although in this case it’s skipped because the alloca is alloca [2097152 x i32], rather than alloca i32, 2097152. We should make this apply to all static allocas.

aeubanks · May 13, 2022, 12:07am

Sorry, that is specifically to prevent promoting a potentially large dynamic alloca to a static alloca, which is a separate issue than simply preventing promotion of large static allocas.

The easiest thing to start experimenting with would be a constant penalty for allocas above a certain size. That would fit in with the existing hotness inline thresholds.

wolfy1961 · May 25, 2022, 12:46am

Thanks,

I’d like to propose to experimentally add to clang a switch -finline-max-stacksize=<N>, which prevents the inliner from inlining any function with a stack size larger than N bytes. This would effectively mimic gcc’s --param large-stack-frame=<N>. In the absence of the switch the compiler behaves as it does now. We could introduce a sensible default if others are in favor of it.

evodius96 · May 25, 2022, 10:16pm

I’d like to propose to experimentally add to clang a switch -finline-max-stacksize=<N> , which prevents the inliner from inlining any function with a stack size larger than N bytes

This may be your intention, but it’d be great if this also worked with link-time optimization (LTO) since that is where we see a lot more aggressive inlining.

wolfy1961 · May 26, 2022, 9:21pm

Right. The intent is certainly to translate the option to some sort of -mllvm option and pass it on to both clangcc1 and lld.

Topic		Replies	Views
LTO and Inlining and Stack Frame Size LLVM Dev List Archives	2	110	November 29, 2018
Clang equivalent to gcc large-function-growth? Using Clang	2	283	September 23, 2022
detailed comparison of generated code size for LLVM and other compilers LLVM Dev List Archives	18	87	December 15, 2009
[RFC] Function stack size section. LLVM Dev List Archives	7	141	September 1, 2017
Implementing "-Wstack-usage=" warning flag LLVM Dev List Archives	5	160	May 19, 2021

Impact of stack size on inlining decisions

Related topics