Machine block placement time and complexity

Hi, I am having some interesting issues when running llc on a piece of auto-generated code, namely that the machine block placement pass is taking an excessive amount of time.

For one file with a rather large function (a threaded interpreter core with indirect br instructions) and a lot of small helper functions (say 3-4k of them) I get the following timings:

llc -O1

  1616.9128 ( 42.6%)   0.4521 ( 22.5%)  1617.3649 ( 42.6%)  1617.7932 ( 42.6%)  Branch Probability Basic Block Placement

llc -O2

  1626.9825 ( 42.1%)   0.3805 ( 13.4%)  1627.3631 ( 42.0%)  1627.5726 ( 42.0%)  Branch Probability Basic Block Placement

llc -O3

  3322.8112 ( 60.2%)   0.8212 ( 35.7%)  3323.6324 ( 60.2%)  3324.1152 ( 60.2%)  Branch Probability Basic Block Placement

That is almost an hour for compiling with -O3, but I have a larger interpreter with more helper functions, and that was clocking in at 10h in compile time, but I am not going to rerun that test right now :grimacing:.

So, what I am wondering if there is a description somewhere about how the pass works or if someone can explain it quickly, and maybe what the approaches to controlling the time it takes would be? This would help me fix our code generator.

Thanks for reporting this, could you post some procedures that we can reproduce this locally?

This pass usually applies on each function, and associated with how many branches you take in that function. Do you have written many if-then-else statements or for loops etc, in a single function definition?

I will see if I can produce a more abstracted test case for this. But yes, there are likely a lot of branches in this. We try to inline most interpreter instruction implementations, so we have say 3k instructions reachable from the indirect br, each of them ending with an indirect branch (via a common dispatch block with exactly one indirect br in the whole function).

The interpreter instructions do contain if/else (e.g. conditional execution via predicates, cache hit tests for interpreter memory accesses, etc) and for loops (loops are in general rare though).