MergeFunctions: reduce complexity to O(log(N))

Hi Nick,
About your partition refinement.

It looks like patch proposed here allows to apply your idea with painless way (as series of small changes actually).

If I got it right, you would like to introduce the next partition-refinement.

Initial partition P consists of only one functions class X0 (whole module, single bucket). Suppose S is subset of X0, so S represents functions with some trait that is absent in rest part of X0.

Then refinement looks like below:

P`=refine(P, S):
   X`[0] = intersection(X[0], S)
   X`[1] = difference(X[0], S) // X0 without S0

so P`={X`[0], X`[1]}

And actually X`[0] and X`[1] are classes of functions that differs with trait defined by S.

Then (suppose, we need to introduce new trait, so it would be S`):

P``=refine(P`, S`), so each X`` could be defined like below:
for each X`[i] in P`:
   X``[i*2] = intersection(X`[i], S`)
   X``[i*2+1] = difference(X`[i], S`)

and so on, until all the traits are over.

Finally we got the classes (set of Xs) that consists of equivalent functions. So, we can merge functions inside the class bounds.

Am I right?

Patch, that was proposed here could be represented as such partition refinement then:

On each refinement stage, you need to introduce S somehow, the algorithm that answers the question: "what sort of difference we want to check in this bucket?"

In existing implementation we check traits already, but pair-wise (attributes, calling convention, number of arguments, number of basic blocks, etc.). How to to use the same traits in partition refinement?

Consider next example. Suppose we want to split some bucket X:

0. Take some trait from *pair-wise* comparison. For example: number of arguments.
1. Take some function from bucket X, let it be F1. Let F1 has N1 number of arguments.
2. Then S are functions with number of arguments *less* than N1.
3. Do refinement by S. As result we got X`[0] and X`[1].
4. If both of X`[0] and X`[1] are not empty, then repeat #1-#3 for them.
5. If some of X`[0] of X`[1] is empty set, then all functions in X are the same in context of trait S. We need another trait, go to #0. And repeat #0-#5 for all X` buckets we have.

Finally we got the groups of equivalent functions. And actually these groups would be organized in binary tree.
Most of stuff is implemented in std::map. I showed it as #0-#5 just to put it into context of your idea. The only problem was to introduce way of how to define order-relation (or how to define S). But this problem has been solved.
And on output we got groups of equal functions.

Fix me if I'm wrong. But it looks like every advantage given by partition refinement is present here.

As I said in beginning, it is way of how to commit things you proposed with minimum of changes. We will use old comparison routines, only thing we need to change is return type (from bool to -1,0,1).
In this way, partition refinement could be committed as series of small changes: one comparison routine per commit.

Every change in MergeFunctions from now could be well checked by this builder:
Now we have special builder for MergeFunctions. It runs test-suite with forced MergeFunctions pass:
http://lab.llvm.org:8011/builders/clang-mergefunc-x86_64-freeBSD9.2
Currently, MergeFunctions pass is stable, builder is green and passes all test-suite programs.

-Stepan.

Stepan Dyatkovskiy wrote:

Hello Sean and Tobias,

Sean,
Thank you. Could you describe Nick's ideas in few words or give me links
to your discussion, so I could adapt my ideas to it.

Sorry I haven't had time to write it up. The essential idea is that we use
partition-refinement to determine the differences between a group of
functions instead of using pair-wise comparison.

I can't remember exactly, but you also had one neat idea about using call
graph information to narrow things down. Was it to use SCC size as an
attribute in the partition refinement?

-- Sean Silva

Hello Sean and Tobias,

Sean,
Thank you. Could you describe Nick's ideas in few words or give me links
to your discussion, so I could adapt my ideas to it.

Sorry I haven't had time to write it up. The essential idea is that we
use partition-refinement to determine the differences between a group of
functions instead of using pair-wise comparison.

I can't remember exactly, but you also had one neat idea about using call
graph information to narrow things down. Was it to use SCC size as an
attribute in the partition refinement?

Yes, you can use the call graph as a factor in determining whether two
functions are the same, cheaply. The obvious case is that if one function
places a call and the other does not, they can't be equivalent. The reverse
post-order traversal performed by the CallGraphSCCPassManager means that
all callees are visited before callers. One of the things mergefunc needs
to consider is that two functions A and B which call two other functions (C
and D) might be different, or might be identical only-if C and D turn out
to be identical. This means we need to defer merging. With the call graph,
if we merge all callees before visiting callers, we don't need to defer
anything. If the callees are different between two functions, we've already
tried and failed to merge the callees ...

... except for the case where the caller and callee are in the same
strongly connected component, in which case we do the same
deferred-decision trick as we have in current MergeFunc.

I implemented all this, but never committed it to llvm because we didn't
have a CallGraph kicking around at the time MergeFunc ran, and computing
the call graph ultimately made it slower.

One other thing before I forget, it's important to merge mergefunc with
constantmerge. Currently we have a situation where we have constant tables
of function pointers (aka. virtual tables) and we have functions which load
these tables and grab functions out of them. It may turn out that all the
virtual functions across two template instantiations are the same, and we
want to merge the vtables and the functions. We can't do that if we don't
try to merge both constants and functions together.

Nick

Hi all,

Previous patch has been split onto series of small changes.
On each stage (after each patch) MergeFunctions pass is compilable and stable.

Please find patches in attachment for review.

To apply all patches at once, use "apply-patches.sh" script as follows:
0. Place "apply-patches.sh" in same directory with patches.
1. cd <llvm-sources-dir>
2. "bash <path-to-patches-dir>/apply-patches.sh"

-Stepan

Stepan Dyatkovskiy wrote:

0001-types-comparison.patch (4.78 KB)

0002-check-for-lossless-bitcast-adaptation.patch (3.71 KB)

0003-constants-comparison.patch (6.33 KB)

0004-constant-comparison-additional-checks.patch (2.02 KB)

0005-values-enumeration-routine.patch (5.17 KB)

0006-attributes-comparison.patch (2.45 KB)

0007-operations-comparison.patch (9.43 KB)

0008-GEP-comparison.patch (3.48 KB)

0009-BB-comparison.patch (3.73 KB)

0010-top-level-compare-method.patch (4.06 KB)

0011-sanity-check.patch (3.89 KB)

0012-removed-unused-methods.patch (2.02 KB)

0013-FnSet-replaced-with-FnTree.patch (6.51 KB)

0014-updated-comments.patch (3.55 KB)

0015-removed-dense-map-helpers.patch (4.42 KB)

apply-patches.sh (199 Bytes)

ping
Stepan Dyatkovskiy wrote:

ping
Stepan Dyatkovskiy wrote:

Hi all!

There is new implementation of partition refinement, that groups together equivalent functions.

The idea is next:
1. We encode each function into numbers sequence with next restriction: equivalent number sequences always means equivalent functions.
2. For each function we split whole sequence onto groups of numbers. E.g. each group means single instruction code.
3. For each group we calculate hash. Ideally it should be good hash, so for different groups we always must have different hash values.
    Standard llvm hashing routine is used, but I also put sanity check, that will inform us in case of bad hash quality.
4. We represent function as sequence of hash numbers:

F0: hashF0-0, hashF0-1, hashF0-2, ...
F1: hashF1-0, hashF1-1, ...

5. Then create difference-tree. Consider we have F0, F1 and F2 (diagram from one of previous posts):
For example:
Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1:

Then we can build the next tree (requires fixed-width fonts):

              [hashF0-0]
               / \
      [hashF0-1] [hashF2-1]
       / \ \
[hashF0-2] [hashF1-2] [hashF2-2]

Patch introduces two main classes:
EncoderContext - is top-level class that encodes function with required way.
DiffTree - tree described in #5.

There are few more details in header comments, you could also found some comments in EncoderContext and DiffTree classes.
I'm working on full-featured documentation for this approach, hope to present it soon.

New classes are included into MergeFunctions pass, so everybody could see how it works right now.
This approach works as fast as previous one (order relation).
As additional bonus - it allows to implement "similar merging" proposed by Tobias von Koch.

Please find the patch in attachment.

Any questions are welcome!

P.S.: It still could contain some typos, I tried to clean everything, though.

-Stepan

Stepan Dyatkovskiy wrote:

diff-graph-2014-02-21.patch (63 KB)

/// Compare two Types, treating all pointer types as equal.

  • bool isEquivalentType(Type *Ty1, Type *Ty2) const;
  • int cmpType(Type *Ty1, Type *Ty2) const;

Hi Nick,

I tried to rework changes as you requested. One of patches (0004 with extra assertions) has been removed.

> + bool isEquivalentType(Type *Ty1, Type *Ty2) const {
> + return cmpType(Ty1, Ty2) == 0;
> + }
>
> Why do we still need isEquivalentType? Can we nuke this?
Yup. After applying all the patches isEquivalentType will be totally replaced with cmpType. All isEqXXXX and friends will be removed in 0011 (old 0012). No traces left.
Old function wasn't removed in 0001 just for keeping patches without extra noise like:

- something that uses isEquivalentType
+ something that uses cmpType

The point is, that "something" that uses isEquivalentType, will be also replaced with one of next patches in this series.

>
> +static int cmpNumbers(uint64_t L, uint64_t R) {
> + if (L < R) return -1;
> + if (L > R) return 1;
> + return 0;
> +}
>
> At a high level, you don't actually need a <=> operator to use a sort. A
> strict ordering ( < operator) is sufficient. (Note that for mergefunc, a
> strict weak ordering is not sufficient, it must be a total ordering.)
That could be done with int FunctionComparator::compare(). We can replace it with bool FunctionComparator::less(). Though for all other cmp methods need at least 2 bits of information as result:
1. Whether things are equal.
2. Whether left less than right.

As for FunctionComparator::compare(), conversion into less() will increase time of sanity check (patch #0010).
Sanity check is just a sweet bonus. It checks that ordering implemented properly (checks order relation properties).
Turning compare() into less() mean, that we'll have to run comparison two times: L.less(R) and R.less(L). But may be sanity check is not a thing to be published at all.

>
> Consider hoisting this inside the FunctionComparator class? That class
> should have a bunch of implementations of comparisons between various
> different things, which can pass down to other methods in the same class.
In new patch series attached to this post, I have moved all static methods into FunctionComparator.

> + // Replacement for type::canLosslesslyBitCastTo, that
> + // establish order relation on this kind of properties.
> + int checkForLosslessBitcast(const Type *L, const Type *R);
>
> Type:: not type:: . Please make this comment more descriptive.
Done.
[new comment]
Replacement for Type::canLosslesslyBitCastTo, that
establish order relation on this kind of properties
Returns 0, if L and R types could be converted to each other without
reinterpretation of bits.
Otherwise method returns -1 or 1, defining total ordering between
types in context of lossless bitcastability trait.
E.g.: if L is less than R (result is -1), than every type that could be
losslessly bitcasted to L is less than R.
[/new comment]

>
> + /// Replacement for C1 == getBitCast(C2, C1Ty)
> + /// Its more controllable, and supposed to be simpler and more
> predictionable.
> + /// As very important advantage: it allows to introduce order relation on
> + /// constants set, and thus use it as trait in refinement routines.
>
> "Its" --> "It's". "predictionable" --> "predictable". And how is it more
> predictable? I think this comment would be better if it described the
> function instead of making comparisons between it and other functions.
> Something like, "Compare constants under a system where pointer to X and
> pointer to Y are considered equal" or whatever is actually true here.
Done.
[new comment]
Replacement for C1 == getBitCast(C2, C1Ty)
Parses constants contents, assuming that types are losslessly
bitcasted between each other. So actually it ignores types and only
compares bits from L and R.
Returns 0, if L and R has equivalent content.
-1 or 1 if values are different. Maintaining total ordering requires
two values that indicates non-equivalence (L less R, L greater R).
[/new comment]

>
> +enum ConstantType {
> I'm not sure that this buys you much. All the "isa" tests can be broken
> down into a switch on getValueID() with the one exception of isNullValue().
Done.

> + assert(
> + C1->getType()->canLosslesslyBitCastTo(C2->getType()) &&
> + "Pass is healthless. checkForLosslessBitcast should be twin of "
> + "canLosslesslyBitCastTo method, except the case when the last one "
> + "returns false, the first one should return -1 or 1");
...
> I think we can skip the asserts here. They aren't detecting a specific
> bug, they're checking whether the new code does a certain task relative
> to the old code. Drop the old code, your new code is the new sheriff in
> town.
Asserts has been removed.

>
> + DenseMap<const Value*, int> sn_map1, sn_map2;
>
> What is sn short for ("seen")? Why are there two of these?
Serial number :slight_smile:
>
> + std::pair<DenseMap<const Value *, int>::iterator, bool>
> + LeftSN = sn_map1.insert(std::make_pair(V1, sn_map1.size())),
> + RightSN = sn_map2.insert(std::make_pair(V2, sn_map2.size()));
>
> So I think I get it, this is easy to reason about. You number each value
> going down both functions. But I think you can eliminate one of these
> maps because you know that if the left and right were ever different
> then we terminate the comparison immediately. You can at least share one
> map with both V1 and V2, but I think you can reduce the number of map
> insertions.
Not sure. Consider that in left you have:
%A = alloca i32
%B = alloca i32
store i32 1, i32* %A

And in right:
%A = alloca i32
%B = alloca i32
store i32 1, i32* %B

When we meet 'store' instruction we need to check in which order %A was allocated at left and in which order %B was allocated at right.
So for each value in function we have to assign serial number. Then comparing to local values from different functions we can just check their serial numbers.
Though, may be I missed something? May be there is another way to determine "who was first".

> High-level question, are attributes ordered? Your comparison is ordered.
> So if I have function F with string attributes "sse2", "gc=hemisphere"
> and another function G with string attributes "gc=hemisphere", "sse2"
> then they would be considered different. I don't think that's right.
I didn't check it. But if it is not ordered, we could order it lexicographically.

>
> + int cmpOperation(const Instruction *I1, const Instruction *I2) const;
> +
> bool isEquivalentOperation(const Instruction *I1,
> - const Instruction *I2) const;
> + const Instruction *I2) const {
> + return cmpOperation(I1, I2) == 0;
> + }
>
> Hopefully this patch series ultimately eliminates calls of
> isEquivalentOperation?
Of course. Just like isEquivalentType.

> By the way, this is an interesting one. Should "add x, y" and "add nsw
> x, y" be equal or not?
Yeah. Those are interesting questions. We could even treat 'mul a, 2' and 'shl a,1' as equal in some cases. I think it's matter of time and further optimization.

Please find first reworked patches in attachment.

-Stepan

Nick Lewycky wrote:

0001-types-comparison.patch (4.85 KB)

0002-check-for-lossless-bitcast-adaptation.patch (4.07 KB)

0003-constants-comparison.patch (6.24 KB)

0004-values-enumeration-routine.patch (5.23 KB)

0005-attributes-comparison.patch (3.01 KB)

0006-operations-comparison.patch (9.42 KB)

0007-GEP-comparison.patch (3.48 KB)

0008-BB-comparison.patch (3.73 KB)

0009-top-level-compare-method.patch (4.01 KB)

0010-sanity-check.patch (3.89 KB)

0011-removed-unused-methods.patch (2.04 KB)

0012-FnSet-replaced-with-FnTree.patch (6.52 KB)

0013-updated-comments.patch (3.29 KB)

0014-removed-DenseMap-helpsers.patch (4.48 KB)

apply-patches.sh (197 Bytes)

ping
Stepan Dyatkovskiy wrote:

I am extremely concerned that you may end up with A < B and B < C but A >
C. There are cases where you use cmpTypes to compare two types, then others
where you compare them via cmpNumbers. The only way to ensure that the sort
function is self-consistent is to be entirely consistent with how we
traverse the function.

0001:
+ int cmpType(Type *Ty1, Type *Ty2) const;

Hi Nick,
I have committed 0001 as r203788.
I'm working on fixes for 0002 - 0014.

> After reading through this patch series, I feel like I'm missing
> something important. Where's the sort function? It looks like we're
> still comparing all functions to all other functions.
When you insert functions into std::set or its analogs it does all the job for you. Since internally it builds a binary tree using "less" comparison, and each insert/look-up operation takes O(log(N)) time.

-Stepan.

Nick Lewycky wrote:

Hi Nick,

Please find the fixed patches in attachment.
Series starts from "constants comparison".

Below small report of what has been fixed, and answers on your questions.

cmpTypes:
> Please add a comment for this method. Include the meaning of the returned value. ("man strcmp" for inspiration.)
Fixed and committed. So you can look in trunk, may be I forgot to do something (hope not :slight_smile: )

checkForLosslessBitcast, cmpConstants:
> Why isn't this using cmpType?
Fixed.

> Please put the else on the same line as the closing brace.
Fixed.

>> + else if (const VectorType *thisPTy = dyn_cast<VectorType>(L))
> Missing initial capital
Sorry. Fixed. Actually these typos has been cloned from Type::canLosslesslyBitCastTo.

>> + return cmpNumbers((uint64_t)L, (uint64_t)R);
>>
> Please assert on unknown constant.
That's possible. But what if we really got unknown constant? New comming constant types merely depends on MergeFunctions implementation. So we get crash if it will happen. And we loose nothing comparing them just as pointers. So, do we really need an assertion? Currently I kept it as it was. If your answer is "yes, we need it", it would easy to add it:

   case Value::FunctionVal:
   case Value::GlobalVariableVal:
- case Value::GlobalAliasVal:
- default: // Unknown constant
- return cmpNumbers((uint64_t)L, (uint64_t)R);
+ case Value::GlobalAliasVal:
+ return cmpNumbers((uint64_t)L, (uint64_t)R);
+ default:
+ llvm_unreachable("Unknown constant.");

About variable names: Var1, Var2 vs VarL,VarR.
I think its better to use L/R concept. Easer to understand what to return (-1, or 1) when you see L/R rather than Var1/Var2.
Var1/Var2 has been kept for functions that are to be removed till the end of patch series.
F1 and F2 names were also kept till the "top level compare method" patch.

> Alternatively, any reason not to have cmpPointers(const void *L, const void *R)?
I could do it. I just wanted to show that pointers are compared like a numbers in some situations. Actually we do it in cases when we have absolutely no idea of smarter comparison.
And back to cmpPonters. Its rather about what intuition tells. If pointers are equal as numbers, could they be different somehow? Could they be non-castable to numbers on some platforms? The last one is big trouble for DenseMapInfo implementation..
If there is any (even very small) possibility of such cases - then yes, I vote for cmpPointers. The last word is up to you anyway.

Attributes (0005):
> Attributes already have operator< and operator==. Please reuse them.
Fixed. I used simple "if":

if (LA < RA)
   return -1;
if (RA < LA)
   return 1;

Though its possible to use:

if (int Res = (LA < RA) ? -1 : (RA < LA) ? 1 : 0)
   return Res;

Which one is more preferable?

cmpGEP (0007):
> + int cmpGEP(const GEPOperator *GEP1, const GEPOperator *GEP2);
> + int cmpGEP(const GetElementPtrInst *GEP1,
> + const GetElementPtrInst *GEP2) {
>
> Const members?
We can't make it constant, since it compares values (cmpValues, aka enumerate). So, if we see Value first time, we have to add it into sn_mapL/R.

> I think you should put this under if (DL) { /* declare Offset1, Offset2, etc. */ }
Fixed.

About 0008:
> Did you mean "cmpOperation"?
Yes. We could keep it in one place. I have fixed this case and removed TODO.

> "compare" still returns "bool". I'm going to assume this was meant to go in 0009.
Yes.

About 0011:
> std::set is frowned upon, see http://llvm.org/docs/ProgrammersManual.html#set . Do you actually rely on the stable iterator guarantee? Or would another set-like container be a
better fit?
Huh.. I have looked for alternatives. Unfortunately SmallSet is not suitable, since we need to lookup existing items, and SmallSet::find method is not present. May be I have missed something.
Actually we need binary tree implementation. Do we have better analogue?
I could implement new one. Think it should be just one more patch afterwards.

>No. An identifier with underscore then capital letter is reserved for the implementation. You can just call them "F" and "DL", then the ctor initializer list can be "F(F), DL(DL)" and that will work fine.
Fixed.

0013:
> Sorry, I'm not sure what this sentence means? The thing your example is talking about is showing is that two functions in an SCC may be equivalent, and detecting them requires "blinding" (ignoring) certain details of the function when you do the comparison. We blind ourselves to the pointee types, but not to callees of functions in the same SCC.

I meant generalising of cross-reference case:
in old implementation, while comparing F and G functions, we treat as equal case when F uses G, with case when G uses F (at the same place). It could happen that F uses G1, while G1 uses G2, while G2 uses F. So it still the same. But currently we have to invent way how to detect such cases.

Thanks for reviews!
-Stepan

Stepan Dyatkovskiy wrote:

0001-constants-comparison.patch (8.54 KB)

0002-values-comparison-replacement-for-enumerate.patch (5.06 KB)

0003-attributes-comparison.patch (2.05 KB)

0004-operations-comparison.patch (9.78 KB)

0005-GEPs-comparison.patch (3.56 KB)

0006-top-level-compare-method.patch (9.55 KB)

0007-sanity-check.patch (3.89 KB)

0008-removed-unused-methods.patch (2.08 KB)

0009-FnSet-replaced-with-FnTree.patch (6.58 KB)

0010-updated-comments.patch (3.24 KB)

0011-removed-DenseMap-helpers.patch (4.48 KB)

apply-patches.sh (355 Bytes)

ping
Stepan Dyatkovskiy wrote: