Hi everyone,
following a round table discussion we had at the MLIR hackathon in Edinburgh, I’d like to propose the introduction of “named constraints” (aka IR concepts) to MLIR, and subsequently the MLIR core dialects.
If you have witnessed this discussion, you can safely skip to the next steps. I added some examples of how this can be used though, if you need some more inspiration.
Constraint declarations
To explain the purpose of named constraints, let’s first consider the existing TableGen Type
and Attribute
constraint mechanism. In TosaTypesBase.td
, you’ll find the following snippet:
def Tosa_Int : AnyTypeOf<[Tosa_Bool,
Tosa_UInt8,
Tosa_UInt16,
Tosa_SignedInt]>;
...
def Tosa_IntLike : TypeOrContainer<Tosa_Int, "signless-integer-like">;
// It actually uses Tosa_TypeLike, but let's assume standard features are used.
This snippet declares the Tosa_IntLike
type constraint, which can now be used by operation declarations to constrain the types of SSA values. In our example, the ApplyScaleOp
is declared as:
def Tosa_ApplyScaleOp : Tosa_Op<"apply_scale", ...> {
...
let arguments = (ins
Tosa_IntLike:$value,
Tosa_IntLike:$multiplier,
Tosa_Int8Like:$shift,
BoolAttr:$double_round
);
let results = (outs
Tosa_IntLike:$output
);
...
}
The result is that any consumer of a verified instance of the ApplyScaleOp
operation can be certain that the concrete type of $value
is (a container of) one of the allowed integers.
Constraints for attributes are handled almost the exact same way.
Implementation
In practice, the TableGen Op record determines how ODS will generate the verifyInvariantsImpl()
method of the generated Op class. It causes ODS to generate a set of functions in the anonymous namespace of the CPP-sided include for the dialect that match the individual constraints. On both $value
and $multiplier
, the following function is called:
static ::mlir::LogicalResult __mlir_ods_local_type_constraint_TosaOps1(
::mlir::Operation *op, ::mlir::Type type, ::llvm::StringRef valueKind,
unsigned valueIndex) {
// Omitted a boatload of terms in the condition for brevity.
if (!((((type.isSignlessInteger(1))) || ...) || ...)) {
return op->emitOpError(valueKind) << " #" << valueIndex
<< " must be signless-integer-like, but got " << type;
}
return ::mlir::success();
}
The names of these functions are generated to be unique per constraint definition, but can effectively be considered anonymous. In other words, there is no expectation that the user would interact with this function voluntarily.
The function is generated by combining the predicates of the underlying Type
or Attr
records, i.e., arbitrary C++ expression snippets, into a single test. That is, although constraint declarations appear to have introspectable semantics, they are just some C++ snippets.
Named constraints
A named constraint is a strengthening of this mechanism that allows the user to:
- carry over their strong domain terminology into C++
- enjoy the convenience of LLVM-style casting, e.g.,
dyn_cast
- use the C++ type system to propagate constraint information
Turning the above example into a named constraint could have the following opt-in syntax:
def Tosa_IntLike : NamedConstraint<"IntLikeType", TypeOrContainer<Tosa_Int, "signless-integer-like">> {
let cppNamespace = "mlir::tosa";
}
As a result, a mlir::tosa::IntLikeType
class would be generated that derives from Type
. Without impacting any previous uses, the user could then ditch nested conditionals in favor of more idiomatic code such as:
T handleType(Type type) {
return llvm::TypeSwitch<Type, T>(type)
.Case([](IntLikeType ty) { return ... });
.Case([](FloatType ty) { return ... });
.Default([](auto) { return T{}; });
}
Additionally, the C++ type system will now carry the information that a type or attribute matches a certain constraint. In conjuction with the TypedValue<T>
template and ODS-generated getters, this means clearer interfaces and less error potential caused by invalid assumptions.
Named constraints can also declare new methods that implement custom behavior relevant only to instances of the constraint, including dispatching to the concrete type or attribute. This can be very helpful in monomorphizing code in scenarios where interfaces are not quite applicable.
bool hasMyTrait(IntLikeType type) {
return ty.getElementBitWidth() <= 64;
}
Such methods can of course also be builder methods, if you want users to construct instances of this constraint in a canonical form. (This also works on operations with the OpBuilder
, if the constraint implements getOperationName()
, returning the name of an already registered op.)
Users can also use named constraints to declare overload sets that reduce the amount of constraint checking if instances are statically known to satisfy them:
bool hasMyTrait(IntLikeType type) { ... }
bool hasMyTrait(Type type) {
if (const auto intLikeTy = llvm::dyn_cast<IntLikeType>(type))
return hasMyTrait(intLikeTy);
if (...)
return ...;
return false;
}
Implementation
In essence, by associating a type name and namespace to a constraint, we will have ODS generate a specialization of the underlying fancy pointer class, e.g., Attribute
or Type
. A prime example of how this looks like in core is the FloatType
implementation in BuiltinTypes.h
.
In its minimum viable form, this looks like:
class MyTypeConstraint : public Type {
public:
[[nodiscard]] static bool classof(Type type) {
return // old ODS predicate expression goes here.
}
using Type::Type;
};
It is permissible to specify any T
that std::is_base_of_v<Attribute, T>
or std::is_base_of_v<Type, T>
as a base, if a constraint should directly specialize another.
Additionally, we would allow the TableGen record to also provide some extraClassDeclaration
segment that adds methods to the constraint type. Until introspection for constraints becomes available, implicit conversions to a constraint, in cases where a constraint is trivially matched, may also be added this way:
[[nodiscard]] static bool classof(IntegerType) { return true; }
/*implicit*/ MyTypeConstraint(IntegerType type)
: Type(llvm::cast<Type>(type).getImpl())
{}
Whenever ODS generates a verifier, it will still produce an anonymous method to emit the error if needed, but will convert all tests for MyTypeConstraint
to simply ::llvm::isa<::mlir::my_ns::MyTypeConstraint>(type)
. In fact, this behavior can already be achieved today by explicitly declaring an ODS type or attribute:
def AnyFloat : Type<CPred<"$_self.isa<::mlir::FloatType>()">, "floating-point",
"::mlir::FloatType">;
Where ::mlir::FloatType
is the fancy pointer specialization that implements the constraint.
Pitfalls
- Named constraints could also apply to Ops (via
OpState
).
There was a historic use of a ConstantIntOp
in the arith
dialect (it might even have been std
back then) that did this for convenience. Compared to Type
and Attribute
, however, this is not that simple because OpState
does not provide the entire interface generic consumers of Op
expect. In other words, only specializations of ODS-generated Ops are trivial.
Additionally, this highlights a “time of check vs. time of use” problem. Operations are mutable and may therefore cease matching a constraint. One can argue this is also the case for types and attributes, but certainly less often.
- Adding named constraints will introduce uses of
llvm::isa
that may execute arbitrarily complex C++.
Although it can be argued that this will encourage users to have more such potentially expensive checks in the code, the proposed change will not impact any existing users. Users that do decide to opt-in will need to be aware of this issue. However, in places where such matching logic is necessary, it is already being paid for, we just make it cleaner.
In fact, the usage of overload sets here might reduce the total number of predicate checks performed.
- Giving structure to constraints could further reduce matching overhead.
Consider a complex constraint C
that is the intersection of simpler constraints C ={C_1, ..., C_N}
. Matching some T
that is statically known to satisfy T = {T_1, ..., T_N}
to C
should at most have to check all constraints C \ T
(set minus).
Assuming all users of this mechanism go through TableGen records such as And
and Or
predicates, we might be able to model this and other simplifications using template metaprogramming on sets. However, the benefit is questionable at considerable maintenance effort.
IRDL will introduce its own model of attribute and type constraints in due time (hopefully), which may later be used to do constraint simplification.
Next steps
At the hackathon, we had reached some sort of agreement that we consider this a valid use of the casting mechanism, and that it may be beneficial. I think named constraints are a good name for this feature, and the opt-in behavior should allow for easy introduction.
We have not agreed on a definitive syntax for the new TableGen record, but it seems rather trivial. As promised, I’m going to whip up a tblgen-mlir patch that implements the feature during the next week.
I want to draw attention to the second part of this proposal, which is porting uses of this pattern in the core dialects. Specifically, FloatType
is such an example. While defined in BuiltinTypes.h
, meaning BuiltinTypes.td
being the “right place” for it, its current counterpart is actually in OpBase.td
. IMHO, that is dodgy anyways, with an eye on decoupling the built-ins. Although I don’t think many people will do type stuff without including the built-in types, this may be breaking for downstream users, even though the API does not change at all.
Thanks for all the support and great discussion over the last couple of days,
~Karl