[GSoC 2016] Implementation of the packing transformation

Dear community,

the next step of the "Improvement of vectorization process in Polly"
project is to implement the packing transformation described in
http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf.

I had a discussion with Tobias and we decided that a packing
transformation is in many ways a data-layout transformation that will
require to introduce a new array, copy data to the array and change
memory access locations of the compute kernel to reference the array.

I think that it could be done in IslNodeBuilder::createMark during
generation of code for mark nodes, which are created in the
ScheduleTreeOptimizer and contain all necessary information about
memory access functions that should be changed.

Michael, if I'm not mistaken, an ability to change memory access
functions to change the arrays a memory access is referencing can be
useful for your DE-LICM work. I would be very grateful, if you or
someone else could share ideas and patches which can be used as a
starting point.

Thank you for the comment! Could you please advise me where I can find
definitions of array expansion and storage mapping optimization?

I've found the following papers that are probably related to this: [1]
and [2]. However, maybe I missed something.

Refs.:

[1] - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.5704&rep=rep1&type=pdf
[2] - https://hal.archives-ouvertes.fr/hal-01257316/document

Dear Roman and all,

Such features would be extremely useful to implement array expansion (scalar
and array renaming, privatization with new subscript expressions of higher
dimension) and storage mapping optimization (generalizing array
contraction). It would be interesting to have these future applications in
mind when working on the packing transformation.

Thank you for the comment! Could you please advise me where I can find
definitions of array expansion and storage mapping optimization?

I've found the following papers that are probably related to this: [1]
and [2]. However, maybe I missed something.

Refs.:

[1] - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.5704&rep=rep1&type=pdf

Yes, this is one of the must-read papers for any polyhedral compilation work!

The following is a generalization to approximate array dataflow (journal extension of a POPL 1998 paper)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.26.6376&rep=rep1&type=pdf
Rather theoretical, and can be simplified to not make explicit usage of transitive closure computations. There is much research to be done in the area, with short term applications and implementation in production. E.g., on-demand expansion on top of the Live Range Reordering technique implemented in isl and PPCG (papers and slides below):
http://impact.gforge.inria.fr/impact2016

[2] - https://hal.archives-ouvertes.fr/hal-01257316/document

That one is rather specific to parallel programs and also theoretical.

State of the art:

- POPL 2016

There is a tool available but it has received little publicity yet:
https://github.com/bondhugula/smo
Yet the input is in for of polyhedral sets/maps, not code.

- CC 2016, Darte, Isoard, Yuki
https://www.conference-publishing.com/list.php?Event=CC16
Better heuristic. Currently limited to single-array SMO.

Again, much potential for research and implementation.

I'll be glad to discuss transfer of these results to Polly, as well as Alexandre, Uday and others on this list.

Albert

Thank you for the information! I'll try to take these applications into account.

Dear community,

I have a few questions about implementation of the packing
transformation. I would be very grateful for your ideas and comments
about them.

1. Could we set MemoryAccess::NewAccessRelation to modify an access
relation of a memory access? As far as I understand,
MemoryAccess::NewAccessRelation is only used to import memory accesses
from JSCoPs

I think that MemoryAccess::NewAccessRelation could be used in the
implementation and also help to perform, for example, array expansion.

However, the modification would probably cause issues. For example, if
we apply tiling, we'll have to substitute old induction variables
before analyzing access functions. I'm not sure whether it's possible
to do it in ScheduleOptimizer. Another possible issue could arise
during import from JSCoPs. In this case, we probably shouldn't modify
imported access relations.

2. What should we do to change an array referenced by a memory access?

If I'm not mistaken, we can set the value of MemoryAccess::BasePtr to do it.

3. Could we copy data to a new array in IslNodeBuilder::createMark?

We can probably create mark nodes, which contain references to memory
accesses that should be modified. Subsequently, using information
about original and new access relations, IslNodeBuilder::createMark
would generate functions that perform such copying.

Dear community,

Hi Roman,

I have a few questions about implementation of the packing
transformation. I would be very grateful for your ideas and comments
about them.

1. Could we set MemoryAccess::NewAccessRelation to modify an access
relation of a memory access? As far as I understand,
MemoryAccess::NewAccessRelation is only used to import memory accesses
from JSCoPs

I already use it in Polly-ACC to modify access relations. If you just
want to modify the subscript expression, this should just work. Changing
the accessed array may also already work, but is a lot less tested and
the information is only partially updated such that further interface
changes would be required.

I think that MemoryAccess::NewAccessRelation could be used in the
implementation and also help to perform, for example, array expansion.

Yes.

However, the modification would probably cause issues. For example, if
we apply tiling, we'll have to substitute old induction variables
before analyzing access functions.

I do not understand this issue. Tiling is a schedule-only transformation
that only considers data-dependences for legality. How do access
functions come into play here? Why do induction variables need to be
substituted.

If you did not see this yet, any modification of the polyhedral access
function will (except for non-affine accesses) directly be
translated to LLVM-IR. This means, we need (almost) no code generation
changes to make this work.

I'm not sure whether it's possible
to do it in ScheduleOptimizer. Another possible issue could arise
during import from JSCoPs. In this case, we probably shouldn't modify
imported access relations.

You mean we should not overwrite the changes that have been imported? I
think overwriting them is OK. If the user prefers to not run schedule
optimizations after the input, he can always disable the schedule
optimizer when importing changes.

2. What should we do to change an array referenced by a memory access?

If I'm not mistaken, we can set the value of MemoryAccess::BasePtr to do
it.

To just make it work, it should be sufficient to set an access function
that has an isl_id that points to a different ScopArrayInfo object.

See: IslExprBuilder::createAccessAddress

  BaseExpr = isl_ast_expr_get_op_arg(Expr, 0);
  BaseId = isl_ast_expr_get_id(BaseExpr);
  isl_ast_expr_free(BaseExpr);
                                                                                 
  const ScopArrayInfo *SAI = ScopArrayInfo::getFromId(BaseId);
  Base = SAI->getBasePtr();

Now, just resetting the access function leaves the MemoryAccess in a
rather inconsistent state. It would be good if we could think about what
is needed to provide a consistent interface for such kind of
functionality.

Also, having jscop test cases to test this functionality would be great.

3. Could we copy data to a new array in IslNodeBuilder::createMark?

We can probably create mark nodes, which contain references to memory
accesses that should be modified. Subsequently, using information
about original and new access relations, IslNodeBuilder::createMark
would generate functions that perform such copying.

Why mark nodes? An option I have been thinking of was to create new -
virtual ScopStmts that have specific semantics. E.g. ones that copy the
content from one array location to another. We could test these using
jscop and then use them in your schedule transformation to implement the
packing.

Michael probably has some opinion here. He played in his delicm work
with some changes that also require more complex modifications of memory
accesses. There is no specific commit (AFAIU), but some changes are
mixed in with his prototype work:

https://github.com/Meinersbur/polly/commits/delicm2

Best,
Tobias