Adding Extended-SSA to LLVM

Now with even the right llvm-dev :slight_smile:

(and just as a short follow, since i got asked privately):

The reason MemorySSA doesn’t have these issues as an IR is three fold

  1. There is no pre-existing IR for memory, really.
  2. Most users use it as a version number for memory state of loads/stores, which works because of #3
  3. It’s a 1:1 mapping of value’s to value’s.

IE one load is one memoryuse
one store is one memorydef

PredicateInfo’s underlying info is a 1:1 mapping of Uses to Values, which is … problematic for the reasons i mentioned.

That’s what the transform does. it renames things so that the 1:1 mapping of uses to values becomes a a 1:1 mapping of values to values, so that everything just works.
:slight_smile:

Now with even the right llvm-dev :slight_smile:

Hey folks,

After a long amount of discussion both offline and on, I put a
pass/intrinsic to add extended SSA up at âš™ D29316 Add predicateinfo intrinsic, analysis pass, and basic NewGVN support
.

Sean asked me to share it more broadly on llvm-dev, so here you go :slight_smile:

For those not familiar with extended-SSA, it's described in the paper
"ABCD: Eliminating Array Bounds Checks on Demand".

There is a very large amount of explanation in the summary of the
review that i don't want to paste here , but the short version is:

1. We make copies of variables where they are used in comparisons that
lead to assumes or branches and it's possible to determine something about
their value from the branch.

IE
define i32 @test1(i32 %x) {
    %cmp = icmp eq i32 %x, 50
    br i1 %cmp, label %true, label %false
true:
    ret i32 %x
false:
    ret i32 1
}

becomes

define i32 @test1(i32 %x) {
  %cmp = icmp eq i32 %x, 50
  br i1 %cmp, label %true, label %false

true: ; preds = %0
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32
%x, 50 }
  %x.0 = call i32 @llvm.predicateinfo.i32(i32 %x)
  ret i32 %x.0

false: ; preds = %0
  ret i32 1
}

All uses that are dominated by the predicate info are renamed to use it.

2. We do so very quickly (it takes about 600ms to do 2 million blocks
with comparisons, by comparison, most passes simply crash or take forever
on the same file. As an example, GVN takes 500 seconds), and only insert
when and where the operands are used.

3. The intrinsics are marked so they do not affect optimization (and
currently, passes that use them all destroy them). They also can detect
when they've been moved to make sure they are still valid if need me.

4. They could be updated pretty easily if we wanted

With one real downside:

5. We modify the IR in an analysis to do so :frowning:

One question that occurred to me reading this, that probably is just my
ignorance, but maybe you can help =]

Any particular reason to not model this as essentially a shadow of the
IR?

Done already :slight_smile: Checkout the newgvn-predicateinfo-uses branch on github
:P.

The short version there's no point is because it's complex, and you can't
update it *anyway* at that point :stuck_out_tongue:

I'm imagining creating a map from operand -> predinfo, where predinfo
essentially models these inserted intrinsics, and rather than RAUW-ing
dominated uses, you insert a mapping for dominated uses.

Ah, if you could easily map from operand to predinfo, this might even
work okayish (you now have O(number of uses in the entire program) lookups
you've added to newgvn, to check if they have predicateinfo, but okay).

But to start:
You can't do this easily
The operands you need to attach to are Use's.

Because

if (a == 0)
ret a
else
ret a

each a has different info.

The predicateinfo insertion accomplishes this by physical renaming, so
the values have different name, and every use of a given predicateinfo has
the same value.

But in a virtual form, it looks like this:

IE this looks like:
if (a == 0)
1 = PredicateDef(branch, blah blah blah)
PredicateUse(1), but really attached to *this use* of a, not to ret
ret a
else
2 = PredicateDef(branch, blah blah blah)
PredicateUse(2), but really attached to *this use* of a, not to ret
ret a

(even if you really wanted to, you have to attach to uses because ách
operand of an instruction could be a use of different predicateinfo)

You can in fact, attach to uses.
You can even do something like the above (IE have predicate use which
stores a regular use), so you know what's up, and have a real def/use form.

it goes downhill from there, though, precisely because the info is
attached to Use's :frowning:

First, we have two types of things that would use this pass.

Lazy things, and propagating things.

Lazy things, that return a value for a thing, all at once, in a single
call (LVI if it had no caches) , might be convertible to use this.

You have the problem that getting a use from an instruction operand is
not currently trivial (we'd have to have a getOperandUse that mirrors
getOperand or something), but not too bad.

But all of ours really cache stuff, so they are really lazy propagating
things.

and Propagating things (CVP, SCCP, GVN, etc) , which are the rest of
them, are hard.

They want to store tables of info by Value.

But we need to store per-use info.

If you modify them to store per-use info for all uses, you take a 5-15x
memory and time hit, to store info about each use, where most have the same
info.

Any elimination you do also has to look up per-use info, not per-value
info like we do now, which means the elimination becomes (Defs/Uses) times
slower too :frowning:

This pretty much puts it out of the range of sanity.
But let's say you are smarter than the average bear. You want to build a
hybrid scheme, where you only store special use info for the things that
are special.
You realize you can make your virtual form, above, Value *'s. Then you
have the name you need because the PredicateDef is a name you can use like
a normal Instruction def.

But you still have to convert back and forth between Use's and Value's
everywhere, and plumb Use& through anything that does lookups, so you can
get a Use& and turn it into the right thing above.
Then, replacement of the original operands, to make it not have to do
lookups on every use in the program, requires storing the original uses in
the PredicateUse class, so you know which uses in the original IR
correspond to which uses in the Predicate virtual IR.

This is a *lot* of plumbing. and because Use's have an operator Value*,
the liklihood you create a lot of bugs is high, because if you use a use
where you meant use.getUser(), it still works :frowning:

You are creative though, and get this all working.

That gets you value numbering (it's a lot harder in a lot of other
cases), without even needing an O(uses in the program) hash table.

Now you have to try eliminate stuff.

In everything but NewGVN, you have to rewrite the eliminators. They all
replace as they go, which means either double the replace cost (since you
have to replace the original uses, and for predicates, the original uses
and the uses in the predicate uses), or even more fun as you try to update
utilities to understand about looking up uses :frowning:

In NewGVN, we eliminate in O(uses that have the same value) time, by
sorting congruence class instruction defs and uses by global and local
dominator numbers, and using a stack.
So we now need to order the predicate defs, uses, and regular defs and
uses all in the right order (and replace the right defs with predicate defs
so the lookup happens right) in dominator order to know which replaces
which.
But you can't use dominates() to do this, because it doesn't understand
these,etc.

So you have to create a local instruction numbering, and figure out,
where in them each thing goes.
Which you can do, at some non-trivial cost over regular instruction
numbering

This is just the simple parts, mind you.
After all this, you have something that takes pretty much double the
memory, a lot harder to understand, *and* it still gets invalidated easily
:slight_smile:

But it's immutable!

There's kind of no upside, and lots of downsides, both in building it,
and trying to use it:)

This would cause it to be a pure analysis again. It would make it
relatively fragile of course -- it would be easily invalidated. But given
how fast it is to compute, for several of the use cases it seems like it
would be sufficient (NewGVN seems like it could tolerate computing this
each time, etc.

Anyways, I'm sure you've looked at something like this, and I'm curious
what goes wrong with it.

The main user at the moment is NewGVN, which uses it to do the
equivalent of GVN's propagateEquality. I'd be happy to make it a utility
called from NewGVN instead of an analysis. A number of folks online and off
asked me to make it usable for others, so i did that :slight_smile: Some folks want to
use it to cleanup existing passes (EarlyCSE, etc), some want to use it in
new passes.

So, this definitely sounds useful, but I think that while it mutates the
IR itself, I think it should just be a utility routine that passes call at
their start to put the IR into the form they need. It'll be a bit tricky if
you want analysis passes to observe this form -- you may not be able to use
the caching infrastructure of the new PM at all for such analyses and may
need to express them as stand alone utilities as well.

Yeah, i'm just going to make it a utility

I don't have much content to add to the discussion, but since I originally
asked you to bring the discussion to llvm-dev: making it a utility SGTM.

-- Sean Silva