------------------------------
*From: *"Daniel Berlin" <dberlin@dberlin.org>
*To: *"Quentin Colombet" <qcolombet@apple.com>
*Cc: *"Hal Finkel" <hfinkel@anl.gov>, "llvm-dev" <llvm-dev@lists.llvm.org>
*Sent: *Friday, August 26, 2016 11:06:56 PM
*Subject: *Re: [llvm-dev] [RFC] Interprocedural MIR-level outlining pass
FWIW: I'm with quentin. Without a good value numbering analysis, this is a
hard problem.
How exactly does value numbering help here? This problem seems to be about
finding structurally-similar parts of the computations of different values,
not identical values.
Note, first, the optimization we are talking about is not outlining, at
least, as far as i've ever heard it.
function outlining/etc is about removing cold regions of functions to
separate functions to make the hot portions smaller, more inlinable, etc.
What is being talked about here is a variant of identical code folding,
e.g., http://research.google.com/pubs/pub36912.html and it's variants.
Where identical varies (and is based on semantic equivalence). Variants
where you extract portions partially to merge functions are pretty much the
same optimization 
Here, value numbering helps because it gives you value expressions.
It tells you:
This operation is "add V1, V2".
It does not matter what V1 and V2 are, they just "are abstract things". We
know if these abstract things are equivalent or not.
More relevantly, these expressions form a value number dag.
That is, they represent the computations being computed only in terms of
operators, abstract variables, and other parts of the VNDAG
The *actual values* of those computations, numbers, etc, do not matter (IE
the abstract value numbers have some set of things *in the program* that
are in the set of things equivalent to that value number).
It is "are these semantically the same operations in the same order",
regardless of the value of those operations.
See, for example, the DAGS generated by:
https://www.researchgate.net/publication/221323142_An_Efficient_SSA-Based_Algorithm_for_Complete_Global_Value_Numbering
Again, this is not something current GVN does, but NewGVN (and other GVN's
do) can give you.
The question of what portions of a function you can merge are variants of
common subgraph and graph isomorphism problems on a VNDAG.
(because in the end, it doesn't matter where the computations are, it
matters what they abstractly compute, and what they depend on)
It seems much closer to the analysis necessary for SLP vectorization than
value numbering, as such.
I think we might be able to use value numbering to help with subset of SLP
vectorization, but only because we can define some kind of equivalence
relation on consecutive memory accesses (and similar). What you need for
this outlining seems more akin to the general SLP-vectorization case. This
might just be the maximal common subgraph problem.
If i have an add in a loop, and it's VN equivalent to an add outside a loop
in some other function, i can still merge the functions 
It's how the value is generated and used that matters, not the structure of
the program.
So you can't just do it on a regular CFG or anything like that if you want
good results.