[llvm-commits] turn on function merging

Daniel Berlin wrote:

I'd like to turn on the function merging pass by default (patch attached).
Function merging is already in the tree and I've been testing it locally for
a while. It works great and doesn't do undue harm to my build times.

Functions are merged when they are identical under a comparison that treats
all pointers as equivalent. So i8* and i32* are considered equivalent types,
so long as you never load one which would produce an i8 as that's different
from i32.

First of all, function merging is good because it produces smaller binaries.
Doing it in the IR (say, instead of the linker) also saves us the compile
time of emitting duplicate functions. When we can't remove a function
mergefuncs leaves a tail-call stub and updates all callers to a canonical
choice, improving the instruction cache hit rate.

The bad side is that the process is inherently O(n^2).

Wait what?
Why?

<I took a look at merge functions>

1. It's not really N^2 unless everything hashes to the same value.
2. You can even do better, just do a cheesy O(N) value numbering pass
on the function, and include the number of values generated. Then
your compare trivially fails quickly if you do it in reverse and they
aren't equal.
3. You can do even better than this, even without value numbering
There is no need for true single pair wise comparison of every
function against every other function in the bucket, you can compare
them all x at a time, using a variant of the multi-way merge
algorithm.

Multi-way partition algorithm for the bucket.

Start -
For all functions in the bucket:
   put IR operation into an array that is (number of functions in
bucket) elements.
   Sort array

Everything that only occurs once in this array can be dropped
completely, it is unmergeable.
Split the array into multiple arrays where all elements in a given
array are equal (O(number of functions time, since it requires a
single array walk)).

Replace all elements in each array with the next IR operation from
each function.
Repeat from start with each ir operation.

When you are done you now have a set of arrays where all elements of a
given array can be merged.

There is even some splittable set data structure (basically opposite
of union-find) that escapes me at the moment to make this all really
fast.

You can also, just like multi-way merge, be a little more complex and
instead of sort array (N log N), do
put IR operation into a heap for each function (the only property you
need from ordering is that equal elements end up next to each other)
extract min on heap repeatedly into a new heap, and switch to a new
heap whenever the value is not the same as the last one, so you end up
with a forest of heaps.
Destroy all heaps with one element.
Repeat just like above.

Right now you are comparing functions without taking advantage of the
knowledge you gained from the last functions. You are basically doing
multi-way list partitioning by pairwise comparison, instead of
all-ways comparison. :slight_smile:

Replying to llvm-dev so that everyone can see your excellent critique. :slight_smile:

Thanks. I recall why I didn't consider doing multi-way comparison, and that's because I had rejected anything that would use up a little more memory. Which is thoroughly stupid in retrospect.

I'd still like to turn it on as soon as I've implemented Chris' review comments, but I think it's clear that removing O(n^2) from it is the next step. In the event that it hurts someone's compile time, we can turn it right back off.

Nick