From: "Daniel Berlin" <dberlin@dberlin.org>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Joerg Sonnenberger" <joerg@britannica.bec.de>, "llvm-dev"
<llvm-dev@lists.llvm.org>
Sent: Thursday, January 14, 2016 1:05:56 PM
Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
Propagation
> > From: "Daniel Berlin via llvm-dev" < llvm-dev@lists.llvm.org >
> > To: "Joerg Sonnenberger" < joerg@britannica.bec.de >, "llvm-dev"
> > < llvm-dev@lists.llvm.org >
> > Sent: Thursday, January 14, 2016 10:38:10 AM
> > Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
> > Propagation
> >
> > > > On Wed, Jan 13, 2016 at 4:23 PM, Joerg Sonnenberger via
> > > > llvm-dev
> > > > <
> >
> >
> > > >
> >
> >
> > > > > > I don't think that arbitrary limiting the complexity of
> > > > > > the
> > > > > > search is the
> >
> > > > > > right approach. There are numerous ways the LVI
> > > > > > infrastructure
> > > > > > could be
> >
> > > > > > made more memory efficient. Fixing the existing code to
> > > > > > be
> > > > > > memory
> >
> > > > > efficient
> >
> > > > > > is the right approach. Only once there's no more low
> > > > > > hanging
> > > > > > fruit
> >
> > > > > should
> >
> > > > > > we even consider clamping the search.
> >
> > > > >
> >
> > > > > Memory efficiency is only half of the problem. I.e.
> > > > > groonga's
> > > > > expr.c
> >
> > > > > needs 4m to build on my laptop, a 2.7GHz i7. That doesn't
> > > > > sound
> >
> > > > > reasonable for a -O2. Unlike the GVN issue, the cases I
> > > > > have
> > > > > run
> > > > > into do
> >
> > > > > finish after a(n unreasonable) while, so at least it is not
> > > > > trivially
> >
> > > > > superlinear.
> >
> > > > >
> >
> > > >
> >
> > > >
> >
> > > > Okay, so rather than artificially limit stuff, we should see
> > > > if
> > > > we
> > > > can fix
> >
> > > > the efficiency of the algorithms.
> >
> > > >
> >
> > > > CVP is an O(N*lattice height) pass problem. It sounds like it
> > > > is
> > > > being more
> >
> > > > than that for you.
> >
> > > I assume you mean something like #BB * #variables?
> >
> > No.
> > 
> > I mean the theoretical complexity of performing the optimization
> > CVP
> > performs (regardless of how it is currently implemented).
> > What CVP does is doable in linear time on number of variables *
> > lattice height. It has some memory cost to do this.
> > Assume for a second the lattice height is fixed (IE we do not
> > have
> > completely symbolic ranges).
> > It is possible, in linear time, to build an SSA-like form for the
> > ranges it is computing.
> > It is then possible, in linear time, to propagate those ranges to
> > all
> > affected variables.
> > This is O(N) + O(N).
> > Doing so may change the ranges (they only get *better*, however).
> > So you may need to do this until they fall all the way down the
> > lattice.
> > This means you may repeat this lattice height number of times.
> What is the height of the lattice here?
The current LVI is a bit hard to analyze, i'm not sure it's really as
fixed height as one would think.
> I understand that it goes from top -> defined -> overdefined, and
> that's three, but defined is not itself one state. Because it is
> tracking integer ranges, those ranges could undergo 2^#bits
> refinements in the worst case (thus making the lattice height quite
> large), no?
If it tracks ranges on a per-bit basis, the max refinements for a
given range is is O(bits * 3), assuming things do not go the wrong
way on the lattice.
IE either they are underdefined, or they are 1, or they are 0, or
they are overdefined.
The lattice is
underdefined
/ \
0 1
\ /
overdefined
As a separate matter, we *should* also do this (i.e. have a fixed-point known-bits analysis), but we don't.
The only way to end up with 2**n refinements is if things can go the
other direction on the lattice.
Note: This is the standard fixpoint algorithm (and what LVI is
emulating, AFAIK). There are range constraint based solvers that are
significantly more complicated, but that's not what LVI does.
Right. LVI handles only single constant-bounded ranges for each variable independently.
(see https://www.cs.berkeley.edu/~daw/papers/range-tacas04.pdf , and
the paper i cite below, which does a variant of that on llvm in
pretty much linear time)
Interesting; thanks for the references! For both, the analysis collapses SCCs in the control flow and, while the SCC analysis is quadratic, SCCs are generally small so they don't observe a problem in practice.
-Hal