[MemorySSA] inserting or removing memory instructions

Hi guys,
a question about updating memory SSA:
Is it expected that e.g insertion of MemoryDef doesn’t change all dominated uses?
For example test case CreateLoadsAndStoreUpdater produces:

define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
store i8 16, i8* %0
; 4 = MemoryDef(1)
store i8 16, i8* %0
br i1 true, label %2, label %3

; :2: ; preds = %1
; 2 = MemoryDef(4)
store i8 16, i8* %0
br label %4

; :3: ; preds = %1
br label %4

; :4: ; preds = %3, %2
; 3 = MemoryPhi({%3,4},{%2,2})
; MemoryUse(3)
%5 = load i8, i8* %0
; MemoryUse(1)
%6 = load i8, i8* %0
}

What is the general behavior that I can expect when I insert or remove def/use?
Another general question: what is the use of MemorySSAUpdater? When should I use updater and when the MemorySSA API is sufficient?

Piotr

Hi guys,
a question about updating memory SSA:
Is it expected that e.g insertion of MemoryDef doesn't change all
dominated uses?

At the moment, it is expected you are essentially just hoisting/sinking
them, not actually changing the aliasing.
The test does not obviously test this constraint, and is pretty contrived.
If you have a use case where we need to rename affected uses, i'm happy to
make it do that, it's trivial.

For example test case CreateLoadsAndStoreUpdater produces:

define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(1)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
; 2 = MemoryDef(4)
  store i8 16, i8* %0
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%3,4},{%2,2})
; MemoryUse(3)
  %5 = load i8, i8* %0
; MemoryUse(1)
  %6 = load i8, i8* %0
}

What is the general behavior that I can expect when I insert or remove
def/use?

So far, it is built to replace all the hoist/sink/removal/etc cases.
It should work for all of those cases.

For use insertion, it will always be correct.
For stores, It will even work if you sink a store to a branch.
It will produce wrong results if you insert new stores in the way of old
stores.

Another general question: what is the use of MemorySSAUpdater?

To replace hand-written broken updating.

When should I use updater and when the MemorySSA API is sufficient?

Unless you are just removing accesses, you should use the Updater.

Due to the single-variable nature of memoryssa, there are tricky cases that
need to be handled.

In particular, if you want to add support, the right way to know what to rename is (off the top of my head)

add a flag or something to have renamepass reset all uses it sees (you only have to change the uses, defs are all linked together and thus already fixed by the updater). Right now it only does that if they have no defining access.

Make it skip blocks already in the visited set (the incomingval to pass to successors is the existing incoming val if getDefsList.rbegin() == getDefsList.rend(), and getDefsList.rbegin() otherwise) to prevent duplicate work from the below:

Now:
call renamepass on the bb for the inserted def, using the defining access of the first def as the incoming val.
call renamepass on the bb of each inserted phi (you can use a null incoming val since incoming val will become the phi)

This should rename all of the affected uses.

Thanks for the reply. I still don’t understand a couple of things.
Let’s take a look at MemorySSATest.MoveAStore test.

The dump of MSSA looks like after each operation with my comments looks like below. I also added question after each transformation:

; Just after constructing

define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
store i8 16, i8* %0
br i1 true, label %2, label %3

; :2: ; preds = %1
; 2 = MemoryDef(1)
store i8 16, i8* %0
br label %4

; :3: ; preds = %1
br label %4

; :4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,1})
; MemoryUse(3)
%5 = load i8, i8* %0
}
Everything is OK.

; Hoisting mem def 2.
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
store i8 16, i8* %0
; 2 = MemoryDef(1)
store i8 16, i8* %0
br i1 true, label %2, label %3

; :2: ; preds = %1
br label %4

; :3: ; preds = %1
br label %4

; :4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,1})
; MemoryUse(3)
%5 = load i8, i8* %0
}
Why phi is still using 1 = MemoryDef? I would expect the phi to be transformed to
3 = MemoryPhi({%2,2},{%3,2})

or even removed.

; After calling MSSA.createMemoryAccessAfter(SideStore, EntryStoreAccess, EntryStoreAccess);
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
store i8 16, i8* %0
; 4 = MemoryDef(1)
store i8 16, i8* %0
br i1 true, label %2, label %3

; :2: ; preds = %1
br label %4

; :3: ; preds = %1
br label %4

; :4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,1})
; MemoryUse(3)
%5 = load i8, i8* %0
}
If createMemoryAccessAfter didn’t remove 2 = MemoryDef, then where it is?
Is it still somewhere in MSSA, and dump just don’t print it?

; After calling EntryStoreAccess->replaceAllUsesWith(NewStoreAccess);
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
store i8 16, i8* %0
; 4 = MemoryDef(4)
store i8 16, i8* %0
br i1 true, label %2, label %3

; :2: ; preds = %1
br label %4

; :3: ; preds = %1
br label %4

; :4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,4})
; MemoryUse(3)
%5 = load i8, i8* %0
}
4 = MemoryDef(4) looks very suspisious…

; After calling MSSA.removeMemoryAccess(SideStoreAccess);

define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
store i8 16, i8* %0
; 4 = MemoryDef(4)
store i8 16, i8* %0
br i1 true, label %2, label %3

; :2: ; preds = %1
br label %4

; :3: ; preds = %1
br label %4

; :4: ; preds = %3, %2
; 3 = MemoryPhi({%2,4},{%3,4})
; MemoryUse(3)
%5 = load i8, i8* %0
}

There is also very similar test - MoveAStoreUpdater and MoveAStoreUpdaterMove, that uses updater - instead
produce exactly what I would expect at the end:

define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
store i8 16, i8* %0
; 4 = MemoryDef(4)
store i8 16, i8* %0
br i1 true, label %2, label %3

; :2: ; preds = %1
br label %4

; :3: ; preds = %1
br label %4

; :4: ; preds = %3, %2
; 3 = MemoryPhi({%2,4},{%3,4})
; MemoryUse(3)
%5 = load i8, i8* %0
}

This seems a little counter-intuitive that there exist another class that is doing updates better than MemorySSA, and that the functions from memoryssa are public (which resulted in my confusion on what
the methods are doing). Is it because for some cases, the updating API that mssa provides is sufficient (and faster?)? Do you have some plans or ideas how we could make it easier to use?

The other topic is that I am trying to come up with the updater for instruction using !invariant.group, and I have no idea how much the updating in MSSA should do and how much MemorySSAUpdater should fix.
There is also another problem that I am not sure if I should fix, because I don’t see O(1) solution for it.

For code:

store 42, %p, !invariant.group !0

store 42, %p, !invariant.group !0

store 42, %p, !invariant.group !0

store 42, %p, !invariant.group !0

store 42, %p, !invariant.group !0

load %p, !invarian.group !0
; The same maybe in different blocks

load %p, !invarian.group !0
load %p, !invarian.group !0

I would expect to have all loads uses the first store. Then if I would start removing stores starting from the top I have to update all the uses all
over again, which is O(#uses). It is probably something that could be solved with the disjoint set data structure, but it gets more complicated if I can’t replace all uses with another def
(because they diverge), so I would have to find the first memoryDef that dominates all the invariant.group uses in given path, or use the closest non-invariant.group def if it doesn’t exist, like here:

define void @hard(i8* %p) {
; 1 = MemoryDef(liveOnEntry)
store i8 42, i8* %p, !invariant.group !0
; 2 = MemoryDef(1)
call void @clobber8(i8* %p)
br i1 undef, label %b1, label %b2

b1: ; preds = %0
; 3 = MemoryDef(2)
store i8 42, i8* %p, !invariant.group !0
; MemoryUse(1)
%1 = load i8, i8* %p, !invariant.group !0
br label %b3

b2: ; preds = %0
; MemoryUse(1)
%2 = load i8, i8* %p, !invariant.group !0
br label %b3

b3: ; preds = %b2, %b1
; 5 = MemoryPhi({b1,3},{b2,2})
; MemoryUse(1)
%3 = load i8, i8* %p, !invariant.group !0
; 4 = MemoryDef(5)
store i8 42, i8* %p, !invariant.group !0
; MemoryUse(1)
%4 = load i8, i8* %p, !invariant.group !0
ret void
}

After removing the first instruction I would expect:

define void @hard(i8* %p) {
; 1 = MemoryDef(liveOnEntry)
call void @clobber8(i8* %p)
br i1 undef, label %b1, label %b2

b1: ; preds = %0
; 2 = MemoryDef(1)
store i8 42, i8* %p, !invariant.group !0
; MemoryUse(2)
%1 = load i8, i8* %p, !invariant.group !0
br label %b3

b2: ; preds = %0
; MemoryUse(1)
%2 = load i8, i8* %p, !invariant.group !0
br label %b3

b3: ; preds = %b2, %b1
; 4 = MemoryPhi({b1,2},{b2,1})
; MemoryUse(4)
%3 = load i8, i8* %p, !invariant.group !0
; 3 = MemoryDef(4)
store i8 42, i8* %p, !invariant.group !0
; MemoryUse(4)
%4 = load i8, i8* %p, !invariant.group !0
ret void
}

Which requires %1, %2, %3 and %4 be updated with different defs.

Piotr

Thanks for the reply. I still don't understand a couple of things.
Let's take a look at MemorySSATest.MoveAStore test.

This test doesn't use the updater?

This is the destructive test of trying to update it without the updater.

The dump of MSSA looks like after each operation with my comments looks
like below. I also added question after each transformation:

; Just after constructing
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
; 2 = MemoryDef(1)
  store i8 16, i8* %0
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,1})
; MemoryUse(3)
  %5 = load i8, i8* %0
}
Everything is OK.

; Hoisting mem def 2.
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 2 = MemoryDef(1)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,1})
; MemoryUse(3)
  %5 = load i8, i8* %0
}
Why phi is still using 1 = MemoryDef?

This is not an updater test.
This literally just moved it using the normal IR moving.
As you can see, it generates a broken thing in the middle state.

I would expect the phi to be transformed to
3 = MemoryPhi({%2,2},{%3,2})
or even removed.

; After calling MSSA.createMemoryAccessAfter(SideStore, EntryStoreAccess,
EntryStoreAccess);
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(1)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,1})
; MemoryUse(3)
  %5 = load i8, i8* %0
}
If createMemoryAccessAfter didn't remove 2 = MemoryDef, then where it is?

This is why it says:
" /// Note: If a MemoryAccess already exists for I, this function will
make it
  /// inaccessible and it *must* have removeMemoryAccess called on it. "

Is it still somewhere in MSSA, and dump just don't print it?

; After calling EntryStoreAccess->replaceAllUsesWith(NewStoreAccess);
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(4)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,4})
; MemoryUse(3)
  %5 = load i8, i8* %0
}
4 = MemoryDef(4) looks very suspisious..

And today we discover updating is hard without the updater :wink:

Yes, it needs to reset the defining access after, but the destructive test
is not doing that.

; After calling MSSA.removeMemoryAccess(SideStoreAccess);
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(4)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,4},{%3,4})
; MemoryUse(3)
  %5 = load i8, i8* %0
}

There is also very similar test - MoveAStoreUpdater and
MoveAStoreUpdaterMove, that uses updater - instead
produce exactly what I would expect at the end:

define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(4)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,4},{%3,4})
; MemoryUse(3)
  %5 = load i8, i8* %0
}

This seems a little counter-intuitive that there exist another class that
is doing updates better than MemorySSA, and that the functions from
memoryssa are public (which resulted in my confusion on what
the methods are doing).

First, I was asked to split the updater into it's own class/file, so i did
:slight_smile:
We could add getUpdater to MemorySSA, and move all the functions there.
We probably should

ATM we have to have createMemory* available because the updater works on
memory accesses, not instructions.
We could make it work on instructions but it's a bit trickier.
The main reason i did not start that way was because most things know where
they are putting the new memory accesses, but we do not.
So, we'd have to locate the nearest memory access to the new instruction
in the block (IE a worst case O(n) walk in either direction, but mssa
lookups) just to know where it's been placed.

The real intent is that people create the memory access and then just hand
it to the updater.

We could speed up the walk by turning what is currently the
createMemoryAccess* into hints to the updater to know where to start the
walk/where to go.

Is it because for some cases, the updating API that mssa provides is
sufficient (and faster?)?

The removal case is the only sane case anymore,IMHO.

I think we should move the functions to the updater to start, and then
slowly deprecate them.

Do you have some plans or ideas how we could make it easier to use?

It's just the old way is too hard for people to use, and the new way is not
fully adopted yet.

The other topic is that I am trying to come up with the updater for
instruction using !invariant.group, and I have no idea how much the
updating in MSSA should do and how much MemorySSAUpdater should fix.

I'm not sure why invariant.group requires anything from MemorySSA updating,
maybe you can explain?
The neither the updater or removal promise that uses will be optimized.
In fact, it resets optimized on uses.

The walker will properly fix up and reoptimize uses where they need it.

There is also another problem that I am not sure if I should fix, because I

don't see O(1) solution for it.

For code:

store 42, %p, !invariant.group !0
store 42, %p, !invariant.group !0
store 42, %p, !invariant.group !0
store 42, %p, !invariant.group !0
store 42, %p, !invariant.group !0

load %p, !invarian.group !0
; The same maybe in different blocks
load %p, !invarian.group !0
load %p, !invarian.group !0

I would expect to have all loads uses the first store.

They will, assuming the optimizer did not hit any limits.

Then if I would start removing stores starting from the top I have to
update all the uses all
over again,

This is why you would not remove them top down. Stores are removed in
post-dom order, or you are insane :slight_smile:

Yes, if you do this top-down, instead of bottom up, it's O(Instructions *
Uses).

That's actually true of the opposite order for non-memory as well:
%1 = add 1, 1
%2 = add 1, 1
%3 = add 1, 1
%4 = add 1, 1
%5 = add 1, 1

use (%1)
use (%1)
use (%1)
use (%1)
use (%1)
use (%1)
use (%1)

If you are eliminating top-down using later-accesses to say that earlier
are dead (as you are suggesting):

You would do
%1->replaceAllUses(%2)
%1->erasefromParent()
%2->replaceAllUses(%3)
...

It has the same time bounds.

which is O(#uses). It is probably something that could be solved with the
disjoint set data structure, but it gets more complicated if I can't
replace all uses with another def
(because they diverge), so I would have to find the first memoryDef that
dominates all the invariant.group uses in given path, or use the closest
non-invariant.group def if it doesn't exist, like here:

define void @hard(i8* %p) {
; 1 = MemoryDef(liveOnEntry)
  store i8 42, i8* %p, !invariant.group !0
; 2 = MemoryDef(1)
  call void @clobber8(i8* %p)
  br i1 undef, label %b1, label %b2

b1: ; preds = %0
; 3 = MemoryDef(2)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(1)
  %1 = load i8, i8* %p, !invariant.group !0
  br label %b3

b2: ; preds = %0
; MemoryUse(1)
  %2 = load i8, i8* %p, !invariant.group !0
  br label %b3

b3: ; preds = %b2, %b1
; 5 = MemoryPhi({b1,3},{b2,2})
; MemoryUse(5)
  %3 = load i8, i8* %p, !invariant.group !0
; 4 = MemoryDef(5)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(4 )
  %4 = load i8, i8* %p, !invariant.group !0
  ret void
}

After removing the first instruction I would expect:

Nope.
As mentioned, neither removal, nor the updater, guarantees optimized uses.
The walker will reoptimize them and reset defining access as necessary.

Instead, all that will happen in your example, if you remove store 2, is
that we will produce a conservatively correct answer:
We will repoint all uses that pointed at 2 at 2's defining access.

This is guaranteed to be a thing that dominates 2 (by the SSA property).
It's also guaranteed not to be wrong. Whatever uses reached 2 could not
have aliased with anything in the way.
So the next thing it could possibly be aliased to is 2's defining access.

Thus, we will produce

define void @hard(i8* %p) {
; 1 = MemoryDef(liveOnEntry)
  call void @clobber8(i8* %p)
  br i1 undef, label %b1, label %b2

b1: ; preds = %0
; 3 = MemoryDef(1)

Note you had reset the memorydef id of this to 2, which will never happen :slight_smile:

  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(1)
  %1 = load i8, i8* %p, !invariant.group !0
  br label %b3

b2: ; preds = %0
; MemoryUse(1)
  %2 = load i8, i8* %p, !invariant.group !0
  br label %b3

b3: ; preds = %b2, %b1
; 4 = MemoryPhi({b1,3},{b2,1})
; MemoryUse(5)
  %3 = load i8, i8* %p, !invariant.group !0
; 3 = MemoryDef(4)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(4)
  %4 = load i8, i8* %p, !invariant.group !0
  ret void
}

Which requires %1, %2, %3 and %4 be updated with different defs.

Again, we do not guarantee reoptimized uses at each intermediate point.
Doing so would be N^3 worst case.
Instead, we guarantee that when you call getClobberingMemoryAccess on a def
or use, it gives you the most optimal answer.
As a side effect, it resets the definingaccess of the use (mainly so it
doesn't waste time on the same links next time)

We used to guarantee the optimized property all the time in the underlying
IR itself, but as you've discovered, it is hard invariant to maintain.

This does mean that *uses* of a store may include false-aliases that need
to be disambiguated if you haven't called getMemoryAccess
But that's the price of admission ATM.
We'd need MemorySSI to solve that well.

Updating it fast otherwise basically requires some interesting linking.
IE what you really want, is, for each memory location or even each store,
the stores to be linked in a way that looks like a dominator tree

has a neat way to handle the storage and finding using quadtrees.

But, Store optimizations are much less common than hoist ones for LLVM.
It's easy enough to teach those things to remove in the right order

I believe https://reviews.llvm.org/D30154 does what you want with being able to rename uses due to aliasing stores.
The createsloadandstoreupdater test now produces:

define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
store i8 16, i8* %0
; 4 = MemoryDef(1)
store i8 16, i8* %0
br i1 true, label %2, label %3

; :2: ; preds = %1
; 2 = MemoryDef(4)
store i8 16, i8* %0
br label %4

; :3: ; preds = %1
br label %4

; :4: ; preds = %3, %2
; 3 = MemoryPhi({%3,4},{%2,2})
; MemoryUse(3)
%5 = load i8, i8* %0
; MemoryUse(3)
%6 = load i8, i8* %0
}

Thanks for the reply. I still don't understand a couple of things.
Let's take a look at MemorySSATest.MoveAStore test.

This test doesn't use the updater?

This is the destructive test of trying to update it without the updater.

The dump of MSSA looks like after each operation with my comments looks
like below. I also added question after each transformation:

; Just after constructing
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
; 2 = MemoryDef(1)
  store i8 16, i8* %0
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,1})
; MemoryUse(3)
  %5 = load i8, i8* %0
}
Everything is OK.

; Hoisting mem def 2.
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 2 = MemoryDef(1)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,1})
; MemoryUse(3)
  %5 = load i8, i8* %0
}
Why phi is still using 1 = MemoryDef?

This is not an updater test.
This literally just moved it using the normal IR moving.
As you can see, it generates a broken thing in the middle state.

I would expect the phi to be transformed to
3 = MemoryPhi({%2,2},{%3,2})
or even removed.

; After calling MSSA.createMemoryAccessAfter(SideStore,
EntryStoreAccess, EntryStoreAccess);
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(1)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,1})
; MemoryUse(3)
  %5 = load i8, i8* %0
}
If createMemoryAccessAfter didn't remove 2 = MemoryDef, then where it is?

This is why it says:
" /// Note: If a MemoryAccess already exists for I, this function will
make it
  /// inaccessible and it *must* have removeMemoryAccess called on it. "

Is it still somewhere in MSSA, and dump just don't print it?

; After calling EntryStoreAccess->replaceAllUsesWith(NewStoreAccess);
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(4)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,2},{%3,4})
; MemoryUse(3)
  %5 = load i8, i8* %0
}
4 = MemoryDef(4) looks very suspisious..

And today we discover updating is hard without the updater :wink:

Yea, that's what I thought.

Yes, it needs to reset the defining access after, but the destructive test
is not doing that.

; After calling MSSA.removeMemoryAccess(SideStoreAccess);
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(4)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,4},{%3,4})
; MemoryUse(3)
  %5 = load i8, i8* %0
}

There is also very similar test - MoveAStoreUpdater and
MoveAStoreUpdaterMove, that uses updater - instead
produce exactly what I would expect at the end:

define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(4)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2: ; preds = %1
  br label %4

; <label>:3: ; preds = %1
  br label %4

; <label>:4: ; preds = %3, %2
; 3 = MemoryPhi({%2,4},{%3,4})
; MemoryUse(3)
  %5 = load i8, i8* %0
}

This seems a little counter-intuitive that there exist another class that
is doing updates better than MemorySSA, and that the functions from
memoryssa are public (which resulted in my confusion on what
the methods are doing).

First, I was asked to split the updater into it's own class/file, so i did
:slight_smile:
We could add getUpdater to MemorySSA, and move all the functions there.
We probably should

ATM we have to have createMemory* available because the updater works on
memory accesses, not instructions.
We could make it work on instructions but it's a bit trickier.
The main reason i did not start that way was because most things know
where they are putting the new memory accesses, but we do not.
So, we'd have to locate the nearest memory access to the new instruction
in the block (IE a worst case O(n) walk in either direction, but mssa
lookups) just to know where it's been placed.

The real intent is that people create the memory access and then just
hand it to the updater.

We could speed up the walk by turning what is currently the
createMemoryAccess* into hints to the updater to know where to start the
walk/where to go.

Is it because for some cases, the updating API that mssa provides is
sufficient (and faster?)?

The removal case is the only sane case anymore,IMHO.

I think we should move the functions to the updater to start, and then
slowly deprecate them.

Do you have some plans or ideas how we could make it easier to use?

It's just the old way is too hard for people to use, and the new way is
not fully adopted yet.

The other topic is that I am trying to come up with the updater for
instruction using !invariant.group, and I have no idea how much the
updating in MSSA should do and how much MemorySSAUpdater should fix.

I'm not sure why invariant.group requires anything from MemorySSA
updating, maybe you can explain?
The neither the updater or removal promise that uses will be optimized.
In fact, it resets optimized on uses.

Ok, that seems to be easier to do in the walker than in updater.

The walker will properly fix up and reoptimize uses where they need it.

There is also another problem that I am not sure if I should fix, because

I don't see O(1) solution for it.

For code:

store 42, %p, !invariant.group !0
store 42, %p, !invariant.group !0
store 42, %p, !invariant.group !0
store 42, %p, !invariant.group !0
store 42, %p, !invariant.group !0

load %p, !invarian.group !0
; The same maybe in different blocks
load %p, !invarian.group !0
load %p, !invarian.group !0

I would expect to have all loads uses the first store.

They will, assuming the optimizer did not hit any limits.

In the algorithm that I posted here https://reviews.llvm.org/D29064
optimizing every store/load with invariant.group is O(1), so there
shouldn't be problems with limits.

Then if I would start removing stores starting from the top I have to
update all the uses all
over again,

This is why you would not remove them top down. Stores are removed in
post-dom order, or you are insane :slight_smile:

Yes, if you do this top-down, instead of bottom up, it's O(Instructions *
Uses).

That's actually true of the opposite order for non-memory as well:
%1 = add 1, 1
%2 = add 1, 1
%3 = add 1, 1
%4 = add 1, 1
%5 = add 1, 1

use (%1)
use (%1)
use (%1)
use (%1)
use (%1)
use (%1)
use (%1)

If you are eliminating top-down using later-accesses to say that earlier
are dead (as you are suggesting):

You would do
%1->replaceAllUses(%2)
%1->erasefromParent()
%2->replaceAllUses(%3)
...

It has the same time bounds.

which is O(#uses). It is probably something that could be solved with the
disjoint set data structure, but it gets more complicated if I can't
replace all uses with another def
(because they diverge), so I would have to find the first memoryDef that
dominates all the invariant.group uses in given path, or use the closest
non-invariant.group def if it doesn't exist, like here:

define void @hard(i8* %p) {
; 1 = MemoryDef(liveOnEntry)
  store i8 42, i8* %p, !invariant.group !0
; 2 = MemoryDef(1)
  call void @clobber8(i8* %p)
  br i1 undef, label %b1, label %b2

b1: ; preds = %0
; 3 = MemoryDef(2)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(1)
  %1 = load i8, i8* %p, !invariant.group !0
  br label %b3

b2: ; preds = %0
; MemoryUse(1)
  %2 = load i8, i8* %p, !invariant.group !0
  br label %b3

b3: ; preds = %b2, %b1
; 5 = MemoryPhi({b1,3},{b2,2})
; MemoryUse(5)
  %3 = load i8, i8* %p, !invariant.group !0
; 4 = MemoryDef(5)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(4 )
  %4 = load i8, i8* %p, !invariant.group !0
  ret void
}

After removing the first instruction I would expect:

Nope.
As mentioned, neither removal, nor the updater, guarantees optimized uses.
The walker will reoptimize them and reset defining access as necessary.

Instead, all that will happen in your example, if you remove store 2, is
that we will produce a conservatively correct answer:
We will repoint all uses that pointed at 2 at 2's defining access.

This is guaranteed to be a thing that dominates 2 (by the SSA property).
It's also guaranteed not to be wrong. Whatever uses reached 2 could not
have aliased with anything in the way.
So the next thing it could possibly be aliased to is 2's defining access.

I thinked about it a little bit more, and it is exactly as you said. I was
worried how mssa will look after the removal of 1 = MemoryDef in example:

define void @hard(i8* %p) {
; 2 = MemoryDef(liveOnEntry)
  call void @clobber8(i8* %p)
  br i1 undef, label %b1, label %b2

b1: ; preds = %0
; 3 = MemoryDef(2)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(liveOnEntry)
  %1 = load i8, i8* %p, !invariant.group !0
  br label %b3

b2: ; preds = %0
; MemoryUse(liveOnEntry)
  %2 = load i8, i8* %p, !invariant.group !0
  br label %b3

b3: ; preds = %b2, %b1
; 5 = MemoryPhi({b1,3},{b2,2})
; MemoryUse(5)
  %3 = load i8, i8* %p, !invariant.group !0
; 4 = MemoryDef(5)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(4)
  %4 = load i8, i8* %p, !invariant.group !0
  ret void
}

It looked wrong to me that now the loads that were defined by the removed
store are now liveOnEntry, but one could remove the store only if it could
prove
that it is redundant (or not used), meaning that the loads are really
defined by liveOnEntry (because every caller would have to set up value of
%p to 42.
Brilliant!

Thus, we will produce

define void @hard(i8* %p) {
; 1 = MemoryDef(liveOnEntry)
  call void @clobber8(i8* %p)
  br i1 undef, label %b1, label %b2

b1: ; preds = %0
; 3 = MemoryDef(1)

Note you had reset the memorydef id of this to 2, which will never happen
:slight_smile:

  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(1)
  %1 = load i8, i8* %p, !invariant.group !0
  br label %b3

b2: ; preds = %0
; MemoryUse(1)
  %2 = load i8, i8* %p, !invariant.group !0
  br label %b3

b3: ; preds = %b2, %b1
; 4 = MemoryPhi({b1,3},{b2,1})
; MemoryUse(5)
  %3 = load i8, i8* %p, !invariant.group !0
; 3 = MemoryDef(4)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(4)
  %4 = load i8, i8* %p, !invariant.group !0
  ret void
}

Which requires %1, %2, %3 and %4 be updated with different defs.

Again, we do not guarantee reoptimized uses at each intermediate point.
Doing so would be N^3 worst case.
Instead, we guarantee that when you call getClobberingMemoryAccess on a
def or use, it gives you the most optimal answer.
As a side effect, it resets the definingaccess of the use (mainly so it
doesn't waste time on the same links next time)

We used to guarantee the optimized property all the time in the underlying
IR itself, but as you've discovered, it is hard invariant to maintain.

This does mean that *uses* of a store may include false-aliases that need
to be disambiguated if you haven't called getMemoryAccess
But that's the price of admission ATM.
We'd need MemorySSI to solve that well.

What is MemorySSI?

Updating it fast otherwise basically requires some interesting linking.
IE what you really want, is, for each memory location or even each store,
the stores to be linked in a way that looks like a dominator tree

Sparse Dataflow Analysis with Pointers and Reachability | SpringerLink

has a neat way to handle the storage and finding using quadtrees.

But, Store optimizations are much less common than hoist ones for LLVM.
It's easy enough to teach those things to remove in the right order

Oh right, so I guess the updater doesn't require any special handling for

invariant groups, but I need to udpate the walker.
Thanks for long reply, it is really helpfull.

Piotr