[MemorySSA] Potential CachingMemorySSAWalker bug

Hi guys,

I think I have run into another CachingMemorySSAWalker cache bug. It’s a bit tricky to reproduce, so I’d like to start by trying to show you what is happening when running EarlyCSE with my local changes to use MemorySSA. I’ve attached a debug log that shows that the value returned by getClobberingMemoryAccess(Inst) after a call to removeMemoryAccess is wrong. The MemorySSA node in question is MemoryUse(7), and the corruption happens after a call to remove MemoryUse(2), at which point its clobber value changes to ‘1 = MemoryDef(liveOnEntry)’. The interesting thing is that is doesn’t seem to be the first call to getClobberingMemoryAccess after the removal that causes the corruption, but rather the second. You’ll notice that I added calls to getClobberingMemoryAccess when doing MSSA.dump(), which is what I’m using to attempt to figure out when the cache gets corrupted.

Hopefully this is enough information to debug the problem. If not perhaps we can look at getting my EarlyCSE changes checked in in a disabled state so you can reproduce the problem directly. I’m also happy to help debug it farther.

LoadShorts-simple2.ll (1.92 KB)

LoadShorts-simple2.log (16.5 KB)

Yeah, that sounds like a fun bug. I’ll take a look later today and see what I can find out. :slight_smile:

I suspect something is pulling the RHS of the memorydef and caching it for calls it should not be used for.
In particular, i suspect we are about to discover we can’t cache the results from both versions of getClobberingMemoryAccess together, or that the cache is not always getting consistently written.

(IE the thing we cache has to be the same no matter which version is called. If it can’t be for some reason, we can’t cache results from both in the same cache. If it can be, i suspect there is a bug in making that work :P)

I suspect the latter, and not the former.

I’ve put my changes to EarlyCSE that trigger this case up on phab here: . These changes depend on so that will need to be applied first. With these changes applied, the original attached .ll file should trigger this bug when compiled with opt -early-cse -early-cse-use-memoryssa

So, after looking at this, I think this is WAI, and the wonkiness is a result of an arbitrary limit somewhere in BasicAA.

I say this because if you remove any unused GEP from your example, the loads (MemoryUse(7), MemoryUse(7)) are both optimized to MemoryUse(1) when we build MemorySSA. Add useless GEP back, and they turn back to MemoryUse(7). This is valid, because BasicAA presumably sees that the malloc’ed pointer doesn’t escape, and it’s marked with noalias. So, BasicAA can assume that it isn’t clobbered by @foo.

With respect to why the clobbers are different, it looks like the walker first sees the code as

load i32, i32* %i27
load i32, i32* %i28

…Which some transform then changes to

load i32, i32* %i27
load i32, i32* %i27

…And then requerys the walker.

Because the MemoryLocation on the second load changes, when you call getClobberingAccess on it, we won’t find a cache entry (indeed, we end up walking all defs up to @malloc in this query). Because this lookup is happening after some useless GEPs have been removed, AA is more aggressive here, and lets us optimize the clobber of the second load.

This doesn’t happen for the first load because its cache is still valid (its MemoryLocation is still %i27).

I can’t think of a good way to address this (AA offering more aggressive results for seemingly unrelated IR changes) in MSSA. Danny – any ideas? :confused:

We shouldn’t really fix MSSA here. As we push MSSA through the compiler, and as the new pass manager stuff gets pushed forward, we should eventually be able to have a nice pass for all the little expensive things that basicaa is doing, and recompute it/update it as necessary.

TL;DR BasicAA was meant to be stateless, and we should move the things that really require state to do well out of it and compute those separately, which will let us get rid of arbitrary walking limits.

In particular, i expect the limit you are hitting is the decomposed gep limit. This is really a forward propagation problem being solved by walking in reverse :slight_smile:
The solution is to just solve it as a forward propagation problem and update results when we need to - because it’s just a forward propagation problem, updating is simply rerunning the propagation problem for things that have changed after removing their lattice values. it should only propagate just what is actually necessary.