My understanding of Polly is that it rearranges the executable
instructions in order to perform the task quicker.
Are there any tools out there that rearrange the data so that it is
For example, say you start with a sparse(but still all in RAM) data
set, and you wish to do computations on it.
I have found that this can be done much quicker if you collect all the
data to be processed into batches, make the batches of data
contiguous, so that each batch fits in the CPU cache, and then process
that batch while only having to access the CPU cache.
Then afterwards, copy out the results back to the sparse layout.
I know this general approach is called "Cache Oblivious Algorithms",
but I was wondering if any compiler optimizations could do this for
For example, if an algorithm had 10 processing steps, and for each
step it scanned the entire data set. An optimization could be to do
all 10 processing steps on the first data item, and then move to the
next item etc. This would process the data much faster due to the
better use of the cache.
Obviously, this cannot be done in all cases, but does something like
polly take into account data locality like the examples above? Enough
so, that is would even add extra malloc and memcopys where needed.