Which pass should be propagating memory copies

Keno_Fischer · May 16, 2017, 5:37pm

Consider the following IR example:

define void @simple([4 x double] *%ptr, i64 %idx) {
%stack = alloca [4 x double]
%ptri8 = bitcast [4 x double] %ptr to i8
%stacki8 = bitcast [4 x double] %stack to i8
call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%stacki8, i8 *%ptri8, i32 32, i32 0, i1 0)
%dataptr = getelementptr inbounds [4 x double], [4 x double] *%ptr, i32 0, i64 %idx
store double 0.0, double *%dataptr
call void @llvm.memcpy.p0i8.p0i8.i32(i8 *%ptri8, i8 *%stacki8, i32 32, i32 0, i1 0)
ret void
}

I would like to see this optimized to just a single store (into %ptr). Right now, even at -O3 that doesn’t happen. My frontend guarantees that idx is always inbounds for the allocation, but I do think the transformation should be valid regardless because accessing beyond the bounds of the alloca should be undefined behavior. Now, my question is which pass should be responsible for doing this? SROA? DSE? GVN? A new pass just to do this kind of thing? Maybe there already is some pass that does this, just not in the default pipeline? Any hints would be much appreciated.

Thanks,
Keno

Daniel_Neilson · May 16, 2017, 5:50pm

The InstCombine transform does exactly what you want. Take a look at lib/Transforms/Scalar/InstCombine/InstCombineCalls.cpp — InstCombiner::SimplifyMemTransfer

With your align parameter on the memcpy being zero you are likely hitting the first conditional in that function:
  if (CopyAlign < MinAlign) {
    MI->setAlignment(ConstantInt::get(MI->getAlignmentType(), MinAlign, false));
    return MI;
  }

Arguably, instcombine probably shouldn’t bail on trying to simplify the memcpy just because it could update the alignment on the call...

-Daniel

Keno_Fischer · May 16, 2017, 5:56pm

Hi Daniel,

as far as I can tell that handles turning small memcpys into store instructions. What I’m looking for
is something that can simplify (copy to stack) → (modify stack) → (copy back to heap) into a single
heap modification.

Keno

Daniel_Neilson · May 16, 2017, 6:15pm

Ah, sorry. I misunderstood the question. I’m new to the LLVM infrastructure, so I’m not sure exactly what exists to be able to do this, but I’d expect it to be some sort of combination of transforms done in sequence — instcombine to lower the memcpys, followed by some sort of data flow transform like value numbering to propagate the value stored to the stack into the second store, then some sort of dead-store. However, there’s a couple of immediate challenges that I can spot — 1) instcombine won’t lower a 32-byte store (it limits itself to lowering 8-byte memcpys and lower), and (2) the aliasing between %ptr & %dataptr might be some sort of barrier to the value numbering.

An alternative would be that value numbering would have to understand the load/store semantics of memcpy, and be smart enough to realize that it’s okay to merge these particular memcpys. There’s a ‘fixme’ in new GVN regarding memcpys in NewGVN::performSymbolicStoreEvaluation(). Perhaps that’s a place to start looking?

-Daniel

Davide_Italiano · May 16, 2017, 6:16pm

This seems like a GVN job to me.

Keno_Fischer · May 16, 2017, 7:28pm

Cool. If I wanted to try to implement this in NewGVN, any hints on how to
start?

Daniel_Berlin1 · May 17, 2017, 5:07am

So,
both GVN’s we have mainly exist to eliminate redundancies, they don’t perform general transforms.
So in your example, it would be “easy” to make NewGVN see through the memcpys

ie
teach it that memcpy(a, b, full size of b and a) means a = b, and propagate it through, and whatever goodness this entails.

It also currently knows how to coerce loads of memset/memcpy to pull the value out of the original piece, even when they are not full size memcpy.
This requires instruction insertion in non-constant cases.
So it only does this for constants ATM (but I have code to extend it to the general case and it works fine, it’s in my patch list).

The infrastructure i’m about to commit tomorrow also makes it know that it can use alternative “fake” instructions to represent values, and make the instructions real later.

So for example, for

if (foo)
a = 5, b = 2
else
b = 5, a = 2

return a + b
it knows that a+b is equivalent to 7

and when faced with:

if (foo)
a = 5, b = 2
else
b = 5, a = 3

return a+b

it knows that a+b is equivalent to phi(7, 8) (which doesn’t exist in the original program), and that it can insert that phi to eliminate c.

You could use that infrastructure to teach it that the memcpy’s are equivalent to stores that it could insert.

I would actually start by teaching it about memcpy in general, and that full size memcpy implies value equivalence (this would require wiring it into performSymbolicCallEvaluation), that store of a memcpy target and memcpy are equivalent when they copy/store the same value (see the FIXME in performSymbolicStoreEvaluation).

If you want to teach it to try to see memcpy’s as stores, that’s possible too.
It’s a bit trickier because NewGVN is ruthless. It terminates only at fixpoint. So you have to make the conditions under which you do things like “see memcpy as store” totally deterministic and ordered, or someone will come up with a testcase where we bounce forever between two the possible values for the same thing.

We have a bunch of verifiers to try to help with this, but as Davide can tell you, it can be a workout sometimes

At this point, in the base, it’s really rare to be able to trigger this behavior (and all cases i’m aware of involve undef).
But if you add symbolic evaluation code, it usually takes a few tries to get it right.

Topic		Replies	Views
Which pass should be propagating memory copies LLVM Dev List Archives	7	96	May 19, 2017
Which pass should be propagating memory copies LLVM Dev List Archives	1	66	May 17, 2017
LLVM struct, alloca, SROA and the entry basic block LLVM Dev List Archives	8	178	September 9, 2015
Optimisation passes introducing address space casts LLVM Dev List Archives	7	110	January 4, 2017
[RFC] Correct implementation of memcpy with metadata IR & Optimizations core , rfc , llvm	5	793	June 20, 2023

Which pass should be propagating memory copies

Related Topics