Polly and optimisations

Hi,

My understanding of Polly is that it rearranges the executable
instructions in order to perform the task quicker.
Are there any tools out there that rearrange the data so that it is
processed quicker?

For example, say you start with a sparse(but still all in RAM) data
set, and you wish to do computations on it.
I have found that this can be done much quicker if you collect all the
data to be processed into batches, make the batches of data
contiguous, so that each batch fits in the CPU cache, and then process
that batch while only having to access the CPU cache.
Then afterwards, copy out the results back to the sparse layout.
I know this general approach is called "Cache Oblivious Algorithms",
but I was wondering if any compiler optimizations could do this for
you.

For example, if an algorithm had 10 processing steps, and for each
step it scanned the entire data set. An optimization could be to do
all 10 processing steps on the first data item, and then move to the
next item etc. This would process the data much faster due to the
better use of the cache.

Obviously, this cannot be done in all cases, but does something like
polly take into account data locality like the examples above? Enough
so, that is would even add extra malloc and memcopys where needed.

Kind Regards

James

Hi James,

Polly has some transformation on the data too (mainly used when targeting GPU back-ends). The thing is that Polly, being based on the polyhedral model, can only work on “regular” transformations. That is, sparse data is hard to handle (position of the data depends on run-time values), but dense data is usually fine.

As for you question on data locality, yes, Polly is actually focused on data locality (temporal locality for now, but spatial locality is in project) and parallelism. But again, because Polly works best on dense algorithm, I am not sure how it would perform on the examples you suggest.

In any case, those discussions are welcome !

Note that most cache oblivious approaches usually only change the iteration orders without changing the data too. But you are right in assuming they are tightly correlated.

Best Regard.