Motivation
The fold
method of operation is likely one of the most used and fundamental methods within MLIR.
Ergonomically, its API has some drawbacks however.
As a quick reminder, the fold
function signature of an OP may either be
-
OpFoldResult fold(ArrayRef<Attribute> operands)
for a single result op -
LogicalResult fold(ArrayRef<Attribute> operands, SmallVectorImpl<OpFoldResult> &results)
otherwise
The issue lies within the first parameter respectively. The ArrayRef<Attribute>
is an array with the size equal to the count of operands of the Op being folded. Accessing each attribute is achieved by indexing into the array using the operands index, whose potentially constant value should be read.
API wise, this has following drawbacks from an implementor of fold
perspective:
- Indexing into the
ArrayRef
requires knowledge about how the Op was declared in TableGen, its position/index within the input list. This is further complicated by the existence ofVariadic
andOptional
, which can lead to odd looking indexing expressions. - Indexing into
ArrayRef
is not resilient to refactoring and extension of the Op. Adding a new operand in TableGen in any position, but the last requires going through thefold
method and reevaluating all indexing expressions as the indices of operands have changed. - The code is not very descriptive. From TableGen generated Ops we are used to the generation of convenient getters with descriptive names using the parameter names given in TableGen.
getLhs()
is a lot more descriptive thanoperands[0]
.
Proposed changes
The solution to the above problem is having a wrapping class that can map getter methods to the precise indices used for accessing operands.
We happen to already generate such a class however! The Adaptor classes that are generated with every Op, essentially do just that. They have the various getter methods and using the attributes and codegen from TableGen, calculates the correct indices for returning operand or operand ranges. The only thing required is to generalize the concept to work on any range, not just mlir::ValueRange
, most importantly ArrayRef<Attribute>
The implementation in this proposal splits the Adaptor class into 3 separate classes, connected by inheritance:
- A
{OpName}GenericAdaptorBase
within adetail
namespace. This class contains all value range independent methods and fields. That is theDictionaryAttr
, theRegionRange
, optional operation name and all attribute getters - A
{OpName}GenericAdaptor
class with the template paramtypename RangeT
. This inherits from the aforementioned class and adds getters for ALL operands, returning the element type of the range passed in. Requirements for theRangeT
, as of my current implementation, is simply being iterable, having a.size()
method, and being constructible from an iterator pair, of its own iterator type at the very least. This lax requirement is implemented by pretty much all containers, but most importantly byArrayRef
andValueRange
- The
{OpName}Adaptor
class, aka the same class name as previously used for the Adaptors. It inherits from{OpName}GenericAdaptor<ValueRange>
, and adds constructors from{OpName}
and the verify method implementation.
The last class, having the same name and through inheritance the exact same API surface as previously, achieves source compatibility with the old Adaptor class.
Rationale for the GenericAdaptorBase
class is partly compile time, as these methods do not
need to be template instantiated, but is also required, as Adaptors index calculating methods have a ciruclar dependency on Ops, if they have the AttrSizedOperandSegments
trait.
TableGen Output Example of affine::PrefetchOpAdaptor before and after
Before:
class AffinePrefetchOpAdaptor {
public:
AffinePrefetchOpAdaptor(::mlir::ValueRange values, ::mlir::DictionaryAttr attrs = nullptr, ::mlir::RegionRange regions = {});
AffinePrefetchOpAdaptor(AffinePrefetchOp op);
std::pair<unsigned, unsigned> getODSOperandIndexAndLength(unsigned index);
::mlir::ValueRange getODSOperands(unsigned index);
::mlir::Value getMemref();
::mlir::ValueRange getIndices();
::mlir::ValueRange getOperands();
::mlir::DictionaryAttr getAttributes();
::mlir::BoolAttr getIsWriteAttr();
bool getIsWrite();
::mlir::IntegerAttr getLocalityHintAttr();
uint32_t getLocalityHint();
::mlir::BoolAttr getIsDataCacheAttr();
bool getIsDataCache();
::mlir::LogicalResult verify(::mlir::Location loc);
private:
::mlir::ValueRange odsOperands;
::mlir::DictionaryAttr odsAttrs;
::mlir::RegionRange odsRegions;
::llvm::Optional<::mlir::OperationName> odsOpName;
};
After:
namespace detail {
class AffinePrefetchOpGenericAdaptorBase {
protected:
::mlir::DictionaryAttr odsAttrs;
::mlir::RegionRange odsRegions;
::std::optional<::mlir::OperationName> odsOpName;
public:
AffinePrefetchOpGenericAdaptorBase(::mlir::DictionaryAttr attrs = nullptr, ::mlir::RegionRange regions = {});
std::pair<unsigned, unsigned> getODSOperandIndexAndLength(unsigned index, unsigned odsOperandsSize);
::mlir::DictionaryAttr getAttributes();
::mlir::BoolAttr getIsWriteAttr();
bool getIsWrite();
::mlir::IntegerAttr getLocalityHintAttr();
uint32_t getLocalityHint();
::mlir::BoolAttr getIsDataCacheAttr();
bool getIsDataCache();
};
} // namespace detail
template <typename RangeT>
class AffinePrefetchOpGenericAdaptor : public detail::AffinePrefetchOpGenericAdaptorBase {
using ValueT = ::llvm::detail::ValueOfRange<RangeT>;
using Base = detail::AffinePrefetchOpGenericAdaptorBase;
public:
AffinePrefetchOpGenericAdaptor(RangeT values, ::mlir::DictionaryAttr attrs = nullptr, ::mlir::RegionRange regions = {}) : Base(attrs, regions), odsOperands(values) {}
std::pair<unsigned, unsigned> getODSOperandIndexAndLength(unsigned index) {
return Base::getODSOperandIndexAndLength(index, odsOperands.size());
}
RangeT getODSOperands(unsigned index) {
auto valueRange = getODSOperandIndexAndLength(index);
return {std::next(odsOperands.begin(), valueRange.first),
std::next(odsOperands.begin(), valueRange.first + valueRange.second)};
}
ValueT getMemref() {
return *getODSOperands(0).begin();
}
RangeT getIndices() {
return getODSOperands(1);
}
RangeT getOperands() {
return odsOperands;
}
private:
RangeT odsOperands;
};
class AffinePrefetchOpAdaptor : public AffinePrefetchOpGenericAdaptor<::mlir::ValueRange> {
public:
using AffinePrefetchOpGenericAdaptor::AffinePrefetchOpGenericAdaptor;
AffinePrefetchOpAdaptor(AffinePrefetchOp op);
::mlir::LogicalResult verify(::mlir::Location loc);
};
Generated Ops would then simply get an additional typedef, which I tentatively call FoldAdaptor
, which will be an alias to {OpName}GenericAdaptor<::llvm::ArrayRef<::mlir::Attribute>>
.
For other code wanting to make use of an Ops GenericAdaptor
a templated using
declaration
is added to the Op as well.
The new fold API would then be:
-
OpFoldResult fold(FoldAdaptor adaptor)
for a single result op -
LogicalResult fold(FoldAdaptor adaptor, SmallVectorImpl<OpFoldResult> &results)
otherwise
Example fold of index::CmpOp before and after
Before:
OpFoldResult CmpOp::fold(ArrayRef<Attribute> operands) {
assert(operands.size() == 2 && "compare expected 2 operands");
auto lhs = dyn_cast_if_present<IntegerAttr>(operands[0]);
auto rhs = dyn_cast_if_present<IntegerAttr>(operands[1]);
if (!lhs || !rhs)
return {};
...
}
After:
OpFoldResult CmpOp::fold(FoldAdaptor adaptor) {
auto lhs = dyn_cast_if_present<IntegerAttr>(adaptor.getLhs());
auto rhs = dyn_cast_if_present<IntegerAttr>(adaptor.getRhs());
if (!lhs || !rhs)
return {};
...
}
Compatibility and Opt-in mechanism
For the purpose of initial source compatibility, the new API would be disabled by default.
This is the part I am most unsure about, but my idea as opt in mechanism is to repurpose the hasFolder
field of Op
, by making it
an int
(effectively enum) with one of three values:
-
kEmitNoFolder
(0) does not emit any fold method -
kEmitRawAttributesFolder
(1), emits the oldArrayRef<Attribute>
fold -
kEmitFoldAdaptorFolder
(2), emits the newFoldAdaptor
fold
The choice of values ensures source compatibility. Upstream and downstream fold
methods could then be gradually migrated to use the FoldAdaptor
fold method.
If deemed strictly worse, the old API could be removed at a future time.
For reference, git grep "::fold(" | wc -l
yields approximately 256 methods upstream that would require a signature change if fully migrated.
Additional Pros
- The generic adaptor is designed to take any range with any default constructible/nullable element type. Any kind of code requiring a mapping of an Ops operands to some other kind of value would strongly benefit from this nicer API. An example for this would be the LLVM exporter which could make use of
{OpName}::GenericAdaptor<llvm::ArrayRef<llvm::Value*>>
to easily map getter names to LLVM values.
Notes
- The extra code generated, by itself, seems to have little to no impact on compile time. As a benchmark I compiled
TestDialect
withclang-cl
in release mode on my Windows machine (i9 13900kf), both with, and without the TableGen related code for the proposal. Using hyperfine as benchmarking tool I got:- Before: 42.881s +/- 0.479s
- After: 42.532 +/- 0.471s seconds
As you can see the difference is statistically insignificant, and essentially equal (within noise levels).
That said, making extra uses of the generic adaptors would certainly have the compile time cost of instantiating the template.
TLDR
New opt-in fold API using new TableGen generated generic adaptors for being able to use getters insteading of indexing, no initial compile time impact and fully source compatible with existing code. New generic Adaptors also useful for other Op operand mapping tasks such for abstract interpretation or for exporters like the LLVM exporter.
Implementation Status
I have an initial implementation fully implementing the semantics and APIs described above, that is just in need of writing more tests and general code cleanup.
I will post a link to a review here as soon as it is on Phabricator.
Until then I thought I’d post this RFC for more visibility, discussion and feedback