Hi all,
I’m working on pass to avoid unnecessary data transfers between host and a device (e.g., GPU).
Consider the following example, assuming %hostX
and %deviceX
are all memrefs.
^bb0:
gpu.launch(...) {
store %device1[...]
}
...
copy_from(%host1, %device1) // Copy from device to host
copy_to(%host1, %device2) // Copy from host to device
gpu.launch(...) {
load %device2[...]
}
In this case, the copy_to
can be eliminated and one can directly load
from %device1
, as the data is still present.
However, in the following case, the transformation is not legal, as the correct version of the data is not available on the device, due to the intermediate store:
^bb0:
gpu.launch(...) {
store %device1[...]
}
...
copy_from(%host1, %device1) // Copy from device to host
gpu.launch(...) {
store %device1[...]
}
copy_to(%host1, %device2) // Copy from host to device
gpu.launch(...) {
load %device2[...]
}
I would like to use the framework from DataflowAnalysis.h
for the analysis, but I’m not 100% sure about the correct lattice.
The documentation states that the lattice must be monotonic, and that includes commutativity: join(x,y) == join(y,x)
. In the example, we can see that the order in which x = copy_from
and y = store %device1
happen clearly matters.
Can such an analysis still be expressed through the framework in DataflowAnalysis.h
? If so, how would the lattice have to look like?
Thanks in advance,
Lukas