[PATCH] D26127: [MemorySSA] Repair AccessList invariants after insertion of new MemoryUseOrDef.

To give this a bit of context, this patch stems from issues that I've
encountered while porting MemCpyOpt to MSSA.

Okay. I'm not sure i would try to port instead of just rewrite. The whole
goal of MemorySSA is to enable us to write memory optimizations in non-N^2
ways.

If you just replace one querying with the other querying, this is not
likely to give you this result.
It may be faster (or slower).

MemCpyOpt currently iterates in no particular order.
It should be possible to do it all in a single iteration, either bottom up
or top down.

Many of the transforms in this pass

create entirely new memsets, memcpys, and memmoves (among others). After
each
new insertion, I expect to also have to update MSSA since 1)
MemorySSAWrapperPass is delcared preserved to the pass manager, and 2)
subsequent MCO iterations which depend on MSSA need to be able to see,
analyze,
and possibly further transform these newly added meminsts.

Can you give an example of a case where, processing in top-down order,
going from defs to uses, it's not possible to do this in a single iteration?

I understand why, giving a random ordering it does it in now, and no
use-def/def-use chains, it cannot do so.
I cannot, off the top of my head, think of a case it would not be possible
to do in a single iteration.

Here's an example: MemCpyOptPass::tryMergingIntoMemset will convert the
sequences of stores to %P into a memset:

  define void @foo(i64* nocapture %P, i64* %Q) {
  entry:
    %0 = bitcast i64* %P to i16*
    %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
    %1 = bitcast i16* %arrayidx to i32*
    %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
  ; 1 = MemoryDef(liveOnEntry)
    store i16 0, i16* %0, align 2
  ; 2 = MemoryDef(1)
    store i32 0, i32* %1, align 4
  ; 3 = MemoryDef(2)
    store i16 0, i16* %arrayidx1, align 2
  ; 4 = MemoryDef(3)
    store i64 0, i64* %Q
    ret void
  }

Specifically, a memset is first inserted right before the store to %Q:

  define void @foo(i64* nocapture %P, i64* %Q) {
  entry:
    %0 = bitcast i64* %P to i16*
    %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
    %1 = bitcast i16* %arrayidx to i32*
    %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
  ; 1 = MemoryDef(liveOnEntry)
    store i16 0, i16* %0, align 2
  ; 2 = MemoryDef(1)
    store i32 0, i32* %1, align 4
  ; 3 = MemoryDef(2)
    store i16 0, i16* %arrayidx1, align 2

    call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 8, i32 2, i1 false)
; new

  ; 4 = MemoryDef(3)
    store i64 0, i64* %Q
    ret void
  }

Next, a new MemoryAccess is created by passing to createMemoryAccessBefore
the
memset instruction, `3 = MemoryDef(2)` (the new access's defining access),

and

`4 = MemoryDef(3)` (the new access's insertion point).

This won't work, of course.

However, this memset is being created to eliminate stores, no?
What stores are you eliminating?

If it it's the stores at 1, 2, and 3, then for sure, the correct defining
access is *not* "3 = MemoryDef".

It's going to be "liveOnEntry" after you kill the stores.

Then, RAUW on each of the MemoryDef's works fine, and you removeAccess them.

  define void @foo(i64* nocapture %P, i64* %Q) {
  entry:
    %0 = bitcast i64* %P to i16*
    %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
    %1 = bitcast i16* %arrayidx to i32*
    %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
  ; 1 = MemoryDef(liveOnEntry)
    store i16 0, i16* %0, align 2
  ; 2 = MemoryDef(1)
    store i32 0, i32* %1, align 4
  ; 3 = MemoryDef(2)
    store i16 0, i16* %arrayidx1, align 2

  ; 5 = MemoryDef(3)
    call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 8, i32 2, i1 false)
; new

  ; 4 = MemoryDef(3)
    store i64 0, i64* %Q
    ret void
  }

This is already problematic because there's no way to selectively update
the
post-5 MemoryDefs that point to 5 instead of 3. Calling RAUW --
specifically,
replacing all uses of 3 with 5 -- would result in this:

I'm aware the existing APIs will not do what you want, because, as i said,
they are specifically and documented to only be meant for replacement.

When i originally had API's to allow for more direct manipulation, folks
instead said "please build utilities to make common transforms work".

Thus, the existing API's make *replacement* work.

No amount of futzing is going to make them work well for insertion.

If you want on the fly updating for stores, where you aren't doing
replacement, that's quite expensive.

MemCpyOpt is mostly doing replacement, however, and you can make them work
if you actually do replacement as replacement.

This likely requires rethinking memcpyopt a bit.

If you want on the fly updating for stores, where you aren't doing
replacement, that's quite expensive.

And FWIW: The *only* passes that actually require it are those doing real
instrumentation.
(IE none of our optimization passes, even those like GVN, need this).