current llvm vs. current gcc code size

This is a bit random but perhaps interesting:

Attached is a histogram of code sizes for about 50,000 random C programs compiled by recent versions of llvm and gcc. x-axis is bytes in the text segment and y-axis is number of programs in each 100-byte bucket.

Code size for a program is taken to be the smallest code size across -O0, -O1, -O2, -O3, and -Os, for a given compiler.

I don't know why each histogram has two peaks, or why llvm's left-hand peak is at a lower code size than gcc's left-hand peak. Perhaps the left-hand peaks indicate trivial random programs that compile completely away and then llvm has a more concise crt0.

Anyway the interesting effect here is the rightward shift of the main bulk of the binaries produced by llvm vs. gcc. This would seem to indicate that there are optimization opportunities being missed by llvm. Ideally, there would be some automatic way to identify these opportunities, perhaps using delta-debugging techniques. If this seems interesting to people I can look into it more...

This is targeting x86/Linux.



This is interesting. Nobody has tried very hard to get -Os to work well AFAIK, so it's that not surprising it doesn't:) Aside from optimization knobs, I don't think there's anything in x86 instruction selection to pick small code sequences, which can be important. I suspect -Os will become important later though.

One thing that might be going on is differing default settings. These are worth checking out:
Frame pointer elimination (small prolog/epilog)?
Relocation model (PIC, dynamic-no-pic, static?)
strict aliasing?


Yeah. In my view on x86 -Os should mainly boil down to keeping loop unrolling and function inlining under tight control.

Unrolling is pretty easy (don't do it) but inlining is tricky since limited inlining can significantly reduce code size, for some inputs. Of course if you go too far there's major bloat.

I find LLVM's inlining heuristics to be pretty ad hoc. There is an easy, more-or-less principled way to minimize code size using inlining, if you can see the whole program. For each function, inline all calls to it if

   (s-o)(c-1) < x

where s is the estimated size of the function, o is the size overhead of the calling convention, c is the number of static callsites, and x is a magic parameter that tries to account for the fact that often there are added benefits to inlining because it enables subsequent intraprocedural optimizations. x can be chosen empirically. Of course this model of code bloat can be extended in any number of ways but it's pretty good as-is.

To the extent that LLVM becomes a compiler of choice for embedded processors, -Os will be important in the long run.