Optimizations are causing excessive stack usage

Hi. I’m pretty new to LLVM, though I have contributed a feature to Clang before.
I’m working on a compiler targeting RISC-V using LLVM’s C++ API. I’ve only generated a little bit of code so far, but I’m already noticing something troubling and I’m wondering if there is a way to fix it.

Here is some LLVM IR code that reproduces the problem:

@data.0 = internal constant [8 x i8] c"87571732"
@data.1 = internal constant [8 x i8] c"49479872"
@data.2 = internal constant [8 x i8] c"82621113"
@data.3 = internal constant [8 x i8] c"39191896"
@data.4 = internal constant [8 x i8] c"93639962"
@data.5 = internal constant [8 x i8] c"84687689"
@data.6 = internal constant [8 x i8] c"77419535"
@data.7 = internal constant [8 x i8] c"00879711"
@data.8 = internal constant [8 x i8] c"50292826"
@data.9 = internal constant [8 x i8] c"62360942"
@data.10 = internal constant [8 x i8] c"23346576"
@data.11 = internal constant [8 x i8] c"59808330"
@data.12 = internal constant [8 x i8] c"09273292"
@data.13 = internal constant [8 x i8] c"17842954"
@data.14 = internal constant [8 x i8] c"46003055"
@data.15 = internal constant [8 x i8] c"88105736"
@data.16 = internal constant [8 x i8] c"75455085"
@data.17 = internal constant [8 x i8] c"28008821"
@data.18 = internal constant [8 x i8] c"93648529"
@data.19 = internal constant [8 x i8] c"26574742"
@data.20 = internal constant [8 x i8] c"48549089"
@data.21 = internal constant [8 x i8] c"01902830"
@data.22 = internal constant [8 x i8] c"72977988"
@data.23 = internal constant [8 x i8] c"96981765"

declare dso_local void @foo(ptr)

define void @main() {
  br label %top
top:
  call void @foo(ptr @data.0)
  call void @foo(ptr @data.1)
  call void @foo(ptr @data.2)
  call void @foo(ptr @data.3)
  call void @foo(ptr @data.4)
  call void @foo(ptr @data.5)
  call void @foo(ptr @data.6)
  call void @foo(ptr @data.7)
  call void @foo(ptr @data.8)
  call void @foo(ptr @data.9)
  call void @foo(ptr @data.10)
  call void @foo(ptr @data.11)
  call void @foo(ptr @data.12)
  call void @foo(ptr @data.13)
  call void @foo(ptr @data.14)
  call void @foo(ptr @data.15)
  call void @foo(ptr @data.16)
  call void @foo(ptr @data.17)
  call void @foo(ptr @data.18)
  call void @foo(ptr @data.19)
  call void @foo(ptr @data.20)
  call void @foo(ptr @data.21)
  call void @foo(ptr @data.22)
  call void @foo(ptr @data.23)
  br label %top
}

I compile this with LLC (from LLVM version 17.0.6) using this command:

llc -mtriple=riscv32 -O1 stack_abuse.ll -o -

In the output, we can see the line addi sp, sp, -112 which means it is storing 112 bytes on the stack. This is because there is some optimization somewhere that is telling LLVM that the pointers to the constant data do not change, so it should compute the pointers before the loop and just reuse them. However, this particular loop does not need to be fast, and RAM will be very limited in my system, and it only takes two instructions to compute a pointer anyway, so this is a bad choice. And it gets worse if I make the function bigger: I’ve seen it use up more than 4096 bytes of stack space just to store these pointers.

With -O0, the problem goes away, but is there a way to avoid excessive use of the stack to store temporary values without disabling all optimizations?

Perhaps I can find a tool to see what optimization passes are being run, disable them one at a time until the problem goes away, and then I can see if the problematic pass has some configuration options?

–David

You can pass -print-after-all to print the IR after each transform. Godbolt has a convenient opt pipeline viewer which wraps around this: Compiler Explorer. It seems like early-machinelicm might be the responsible pass here? Compiler Explorer is a hacky combination of options which disables the transformation here; you might be able to do better. (Note that you won’t need the -mllvm argument prefix when invoking llc directly.)

1 Like

Thank you so much for the fast and very useful response, including the workaround!

Now that I know the problem is coming from lib/CodeGen/MachineLICM.cpp, I’ve managed to narrow it down. There is a function called CanCauseHighRegPressure in that file and it is supposed to prevent the kind of situation I detailed above by reporting whether hoisting a particular instruction out of a loop will cause high register pressure.

However, that function never returns true while compiling my code, because the cost values it receives as arguments are always zero. This looks like a bug to me.

–David

Is this LLVM 17 or 18?

Oh yeah, I forgot to mention that. I initially saw the problem with LLVM 17.0.6 (from MSYS2), but then yesterday I compiled LLVM 18.1.0 (git commit 461274b) myself from source on Linux and the problem is still there.