Optimizing for size - licm

Hello everyone!

It seems that machine-licm almost always has a negative impact on code size.
I assume this is normal and often due to higher register usage (small example here: Compiler Explorer).

My question is: Is there any reason to run this optimization at -Oz optimization level?

Disabling it leads to interesting text size reductions: -0.9% on SPEC2006, -0.6% on SPEC2017 and -0.12% on AOSP.

For context, at Linaro we recently experimented with reducing the code size of AOSP.
In doing so, we observed that code size can be significantly reduced by simply enabling/disabling/tweaking some optimization flags and thresholds.
Compared to -Oz, AOSP total text size can be reduced this way by ~6% (~3% due to inline-threshold alone).
Some of these flags also have a positive impact on benchmark code size. I plan to propose patches to enable some of them.

2 Likes

This is interesting. I work on the compiler toolchain used by Meta’s Android apps, and size is a prime concern for us. I’ll evaluate disabling MachineLICM for our builds and report the size results.

What inline-threshold did you find to be most effective for text size? I had experimented with that internally but found that the default -Oz level of 5 was actually optimal for us.

LICM can actually reduce register pressure in some cases: hoisting an instruction with two or more operands could mean those operands aren’t live inside the loop.

If disabling MachineLICM leads to substantial size savings, that could translate to performance improvements at other optimizations levels; we might want to look at restraining it at other optimization levels, or making remat more powerful.

Sure; the hard part is making those changes general enough that you can push them upstream without breaking someone else’s workload.

1 Like

On AOSP system binaries and shared libs, I have found that 75 gives the best code size results.
On SPEC benchmark, the default value of 5 seems to be the best.

Many AOSP source files need to be compiled with more aggressive inlining to get a smaller size, and are larger with Oz than with O2. Trying to explain this.

1 Like

Yeah sure, I also observe cases where MachineLICM leads to size reduction. These seems rare on SPEC & AOSP though.

Yes, indeed, it may be interesting to look at the other optimization levels and performance figures.
I’ll try to do that and share the results here. Thanks

We measured a 0.25% text size reduction for the 64-bit Facebook for Android app with MachineLICM disabled (-mllvm -disable-machine-licm). We haven’t measured the corresponding performance impact yet (if any) though. Our measurement was also somewhat inaccurate in that we have libraries built with a mix of optimization levels and linked together with LTO, so whereas the measurement was for disabling MachineLICM globally, in practice we’d only want to disable it for minsize functions, and we didn’t measure the effect of that specifically.

Oh cool, interesting. If you’re looking for other flags to reduce code size at Oz level without looking at perfs, you could also try one of these flags giving good size results on AOSP & SPEC:
-disable-lsr=true, -enable-pre=false, -enable-load-pre=false, -enable-ipra=true, -enable-linkonceodr-outlining=true

Ok, here are some performance figures with machine licm disabled. Took me some time to setup.

  CINT2006, train dataset, aarch64 machine

  -O3 vs -O3 -disable-machine-licm -disable-postra-machine-licm:
    +1.0% exec_time, -0.4% text_size

  -O2 vs -O2 -disable-machine-licm -disable-postra-machine-licm:
    +0.6% exec_time, -0.6% text_size

  -Os vs -Os -disable-machine-licm -disable-postra-machine-licm:
    +1.0% exec_time, -0.7% text_size

  -Oz vs -Oz -disable-machine-licm -disable-postra-machine-licm:
    +1.7% exec_time, -0.8% text_size

Disabling MachineLICM results in code size reduction and slowdown at all optimization levels.

Of course, this doesn’t mean that there is nothing to do to improve the optimization.
I saw nothing obvious in the few cases I analyzed (small creduce’d code, probably not representative).

So I’m still thinking about proposing a patch for disabling it at Oz level.