Function too big with new pass manager

I’m currently debugging a performance issue in GraalVM that seems to be caused by a function being compiled to bitcode that’s way bigger and more complicated that I would expect.

The function I’m looking at is the yaml_parser_fetch_more_tokens function in truffleruby/scanner.c at master · oracle/truffleruby · GitHub .
For reproducing the issue, you need only scanner.c, yaml.h and yaml_private.h from that directory. Then compile with clang -S -emit-llvm -I. -O1 scanner.c, and search in the result for define .*yaml_parser_fetch_more_tokens (the -O level doesn’t really matter, as long as it’s not -O0).

The resulting function has almost 3500 bitcodes. The first basic block contains 60 alloca, and then a mix of 700 bitcast and getelementptr. Many of them are used later, all over the rest of the function. As far as I can tell, the bitcode is correct, just a lot bigger than expected.

The same function compiled with -flegacy-pass-manager -O1 looks totally fine, with just ~40 bitcode instructions. Even with -flegacy-pass-manager -O3 it’s only sligthly bigger, ~70 bitcode instructions, so it can’t be just the fact that the new pass manager does some inlining at -O1.

I’m not sure if I’m missing something, but keeping hundreds of SSA values alive across the whole function doesn’t seem right. Is this the intended behavior? If so, how is the code generator dealing with this? Is there maybe some other transformation still happening in the backend?

Or is this function hitting some corner case, and there is maybe some optimization running wild?

Running with -mllvm --print-changed=quiet -mllvm --filter-print-funcs=yaml_parser_fetch_more_tokens shows that it is indeed the inliner, even at -O1, which is blowing up.

Looks like it. Do you consider that a bug? Should I report it on github?

I would consider it a bug, because aggressive inlining at -O1 seems like serious overkill. But this has been a known difference between legacy and new pass managers for a while, I think. Filing an issue can’t hurt, though.

I think this is more than just aggressive inlining. I think this is too agressive, even for -O3. It doesn’t really make sense to pull out that many GEPs into the first block, because that just leads to spilling. These hundeds of values have to be kept somewhere. I haven’t actually measured it, but pretty sure a repeated GEP would be cheaper than loading that value from the stack. Unless I’m missing some magic going on in the backend?

Compare for example to the same function compiled with -O3 and the legacy pass manager. There it still inlines a bit and pulls out a few GEPs into the first basic block. But it’s only 8, which easily fits in registers.

Even more reason to file an issue!

I filed Runaway inlining with new pass manager. · Issue #57006 · llvm/llvm-project · GitHub .