Lattice for DataFlowAnalysis

Hi all,

I’m working on pass to avoid unnecessary data transfers between host and a device (e.g., GPU).

Consider the following example, assuming %hostX and %deviceX are all memrefs.

^bb0:
gpu.launch(...) {
    store %device1[...]
}
...
copy_from(%host1, %device1) // Copy from device to host

copy_to(%host1, %device2) // Copy from host to device
gpu.launch(...) {
    load %device2[...]
}

In this case, the copy_to can be eliminated and one can directly load from %device1, as the data is still present.

However, in the following case, the transformation is not legal, as the correct version of the data is not available on the device, due to the intermediate store:

^bb0:
gpu.launch(...) {
    store %device1[...]
}
...
copy_from(%host1, %device1) // Copy from device to host

gpu.launch(...) {
    store %device1[...]
}

copy_to(%host1, %device2) // Copy from host to device
gpu.launch(...) {
    load %device2[...]
}

I would like to use the framework from DataflowAnalysis.h for the analysis, but I’m not 100% sure about the correct lattice.

The documentation states that the lattice must be monotonic, and that includes commutativity: join(x,y) == join(y,x). In the example, we can see that the order in which x = copy_from and y = store %device1 happen clearly matters.

Can such an analysis still be expressed through the framework in DataflowAnalysis.h? If so, how would the lattice have to look like?

Thanks in advance,
Lukas

DataflowAnalysis.h is sparse (only follow use-def chains), and so it cannot model this problem as you are using non-SSA values. The order of join function arguments is not related to program order. It is just a function of lattice values with certain properties.

We discussed a mild extension in (DataFlowAnalysis.h questions - #6 by River707) that would allow you to identify that %device1 has any potential stores to it, but that is probably more conservative than you want (it would treat a store after the copy_to as also clobbering as far as the analysis is concerned). I don’t know if that extension was implemented though.

You might be able to get away with just a few simple patterns. I have a couple in npcomp that have been very useful and avoided me from having to solve the full general version of this problem so far (which for me is basically “SSA formation + slices”): mlir-npcomp/MaximizeValueSemantics.cpp at f71845ea75fe489ed3f2df5291447b3436d86a07 · llvm/mlir-npcomp · GitHub

@_sean_silva: Thanks for your quick reply!

I took a look at the patterns you mentioned and will pursue a similar approach with simple, local patterns. That should be sufficient for my use-case for now.

Thanks,
Lukas