[RFC] Add direct support for permutations

The Problem

Permutations are an extremely common use case of AffineMap; however, there are a number of performance problems with the current implementation for that use case.

For example, the current implementation of AffineMap::getPermutedPosition(unsigned) performs a linear scan of the ArrayRef, and therefore takes O(n) time for each call. It is a common use case to want to call this method within a loop; however, that introduces an asymptotic slowdown, since it will take O(m*n) time. To avoid the slowdown users must manually float out construction of the inverse permutation so that the whole process takes only O(n+m). However, if the process is called repeatedly, then that means it’ll take O(k*(n+m)), even though the O(k*n) work is k-redundant (i.e., it should take only O(n + k*m)).

The Proposed Fix

Add a new subclass of AffineMap for permutations, and have PermutationMap compute the inverse permutation once and store it alongside the forward permutation. This would allow getPermutedPosition to execute in O(1) time, without needing to manually float out construction of the inverse map, nor repeatedly constructing the inverse map.

In addition, various permutation-specific methods of AffineMap should be moved to the PermutationMap class instead, since that would allow those methods to avoid verifying that the map is a permutation every time they’re called.

Alternative Fixes

We could have PermutationMap be independent of AffineMap, rather than being a subclass. This would make certain aspects of implementation easier (e.g., don’t need to deal with isa/cast/dyn_cast). However, I feel that it would make the API less clean, and thus end up pushing more complexity onto the client code.

Future Directions

We may want to add some MLIR syntax for PermutationMap as a new builtin type (i.e., for an attribute wrapper of the PermutationMap itself), so that the MLIR type system can enforce requirements that something be a permutation. However, that’s a more involved change not considered in this RFC.

It might be nice to have some other intermediate classes for more general sorts of invertibility (e.g., PermutationMap < InvertibleAffineMap < RetractableAffineMap < AffineMap), but that development should wait until those classes are actually needed, since detecting/verifying invertibility/retractability requires more work.

2 Likes

Do you have any numbers that show how big the effect is on the compile time? I understand that it is asymptotically better to have a dedicated data structure, but we aren’t even remotely in the asymptotic case: I haven’t seen maps that use more than a dozen dimensions. I expect the performance hits to come from the locking behavior in creation of AffineExpr

1 Like

Besides potential performance gains (which indeed may be minor given the typical number of indices used), to me it seems a good design to provide a hierarchy of maps, each which the right set of primitives. Right now, for example, we provide some primitives that assert when an assumed condition is not met, such as a permutation. It would be nice if the type hierarchy enforces what kind of map we are dealing with. Do you agree?

I am generally in favor of more structured representations as long as we carefully weigh the costs. Specifically, AffineMap (as opposed to AffineMapAttr) is a rather strange standalone concept, and therefore a common stumbling point for many developers, in what I consider “core IR”. Adding a hierarchy of such concepts may worsen this. There are also some implementation-related considerations regarding the storage of inverse permutations: if it is stored in the context, how do we recover it after an upcast to AffineMap; if it is stored in the object, how do we go about breaking the promise that AffineMap is just a pointer and should be passed by-value?

My preference given my understanding of the actual use case would be to have a PermutationMapAttr in some dialect that represents permutation maps and may be explicitly converted to and from AffineMap so the user never pays the cost of affine maps unless they are required. It can then be reused, e.g., as a layout attribute.

A HyperRectangular concept that is well integrated is something we discussed a few times a while back.
Inverse and projections would benefit from a restricted representation.

The next class would be UTVPI (https://dl.acm.org/doi/10.1145/2429069.2429127) and the ones you also propose.

While these are interesting, in practice the need has not been pressing and the time effort of this endeavor vs other things did not seem justified when we last looked.