RFC: Declarative Op Assembly Format
(Or Auto-Generating Custom Parsers and Printers for Operations)
Hi all,
I’d like to propose a specification format for declaratively defining the custom assembly format of operations. I believe that this can bring about many different benefits:
- Remove a bunch of boilerplate c++
- Defining a custom format, even a simple one, usually involves at least 20-30 lines of c++.
- Better verification
- Having a declarative format means that we can provide better verification of the completeness of the syntax with respect to round-tripping, e.g. make sure that the attribute dictionary is present.
- More uniformity with operation syntax
- By having a centralized format, the syntax of operations becomes more regular and uniform.
- It’s a long standing TODO
- See the bottom of the existing ODS documentation.
Before going into the exact details of the proposed format, I’d like to state upfront a major non-goal: this format is not intended to capture every use case. We should focus on capturing the major, commonish, use cases but leave the craziness/irregularity to those that really need/want it. With that being said, let’s jump into the proposal itself:
Op Asm Format
To illustrate the format, let’s look at an example of the format for std.call
:
def CallOp ... {
let arguments = (ins FlatSymbolRefAttr:$callee, Variadic<AnyType>:$operands);
let results = (outs Variadic<AnyType>);
}
Below is the format that std.call
is currently defined as and the equivalent in the declarative form:
static ParseResult parseCallOp(OpAsmParser &parser, OperationState &result) {
FlatSymbolRefAttr calleeAttr;
FunctionType calleeType;
SmallVector<OpAsmParser::OperandType, 4> operands;
auto calleeLoc = parser.getNameLoc();
if (parser.parseAttribute(calleeAttr, "callee", result.attributes) ||
parser.parseOperandList(operands, OpAsmParser::Delimiter::Paren) ||
parser.parseOptionalAttrDict(result.attributes) ||
parser.parseColonType(calleeType) ||
parser.addTypesToList(calleeType.getResults(), result.types) ||
parser.resolveOperands(operands, calleeType.getInputs(), calleeLoc,
result.operands))
return failure();
return success();
}
static void print(OpAsmPrinter &p, CallOp op) {
p << "call " << op.getAttr("callee") << '(' << op.getOperands() << ')';
p.printOptionalAttrDict(op.getAttrs(), /*elidedAttrs=*/{"callee"});
p << " : " << op.getCalleeType();
}
vs:
$callee `(` $operands `)` attr-dict `:` `(` type($operands) `)` arrow-type(results)
Looking at the above, the format itself is comprised of three major components:
Directives
directive ::= identifier ((
arguments )
)?
- A directive is a type of builtin function, with an optional set of arguments.
An initial set of directives, which will be expanded as needed, are listed below:
attr-dict
- The attribute dictionary of the operation.
arrow-type | colon-type | type
- Type of the given entity, which is either an operand or result.
operands
- Represents all operands of the operation.
results
- Represents all results of the operation.
Literals
literal ::= ` (keyword | punctuation) `
- A literal is either a keyword or punctuation surrounded by ``.
Variables
variable ::= $ identifier
- A variable is an entity that has been registered on the operation itself, think arguments, results, etc.
One Thorny Bit: Inferred Types
This section focuses on a particularly thorny bit of defining a declarative format, and the thing that I would most like opinion on(as this affects everyone, and I’m not perfect). One interesting aspect of the custom format for many operations is that the type for certain operands/results is often inferred from the types of other operands/results in the format. For example,
- AddIOp
- This is a binary arithmetic operation where all operands and results have the same type.
- ExtractElementOp
- This op infers the result type from the element type of the aggregate operand (this is one example of many).
The thorny bit is how we express this in the format, and there are many different options.
// Assume we have a format directive used to get the element type of the given
// input type.
def FormatGetElementType : FormatDirective<"$0.getElementType()">;
def AddIOp … {
let arguments = (ins IntegerType:$lhs, IntegerType:$rhs);
let results = (outs IntegerType);
}
def ExtractElementOp … {
let arguments = (ins AnyTypeOf<[AnyVector, AnyTensor]>:$aggregate,
Variadic<Index>:$indices);
let results = (outs AnyType);
}
With the base format being shown below, how should we inject the necessary constraint?
AddIOp: $lhs, $rhs attr-dict colon-type($lhs)
ExtractElementOp: $aggregate `[` $indices `]` attr-dict colon-type($aggregate)
Some potentials:
// Inline as an argument:
$lhs, $rhs attr-dict colon-type($lhs, $rhs, results)
$aggregate `[` $indices `]` attr-dict colon-type($aggregate, results=FormatGetElementType)
// Inline as a trailing list:
$lhs, $rhs attr-dict colon-type($lhs)[$rhs, results]
$aggregate `[` $indices `]` attr-dict colon-type($aggregate)[results=FormatGetElementType]
// Inline with a colon list:
$lhs, $rhs attr-dict colon-type($lhs : $rhs, results)
$aggregate `[` $indices `]` attr-dict colon-type($aggregate : results=FormatGetElementType)
// Out-of-line.
$lhs, $rhs attr-dict colon-type($lhs)
where type($rhs)=type($lhs), type(results)=type($lhs)
$aggregate `[` $indices `]` attr-dict colon-type($aggregate)
where type(results)=FormatGetElementType(type($aggregate))
For some of these we can likely get away with detecting specific traits, but this doesn’t really cover anything outside of standard/builtin types.
Any thoughts?
– River