[RFC][ODS] Attribute and Type Assembly Formats

Mogball · October 7, 2021, 10:09pm

Attribute and type defs in ODS don’t provide a declarative way to describe parsers and printers in the same way that ops do. This RFC proposes an assemblyFormat field for TypeDefs and AttrDefs to generate parsers and printers.

Defining a Custom Assembly Format

Attributes and type defs that define an assemblyFormat must specify mnemonic. Then, the assembly format is specified in a manner very similar to op assembly formats:

def MyAttr : AttrDef<MyDialect, "MyAttr"> {
  let parameters = (ins "int32_t":$value);
  let mnemonic = "my_attr";
  let assemblyFormat = "`<` $value `>`";
}

The above attribute will have an assembly format #my_dialect.my_attr<42>, since it meets the requirements for pretty-printing format.

let assemblyFormat = "`(` $value `)`";

The above format will result in the ugly form: #my_dialect<"my_attr(42)">

Special Directives

The only available directive is struct. The directive struct($one, $two, $three, ...) matches a comma-separated list of key-value pairs for the parameters one, two, and three in any order (but all must be present).

def MyStructType : TypeDef<MyDialect, "MyStructType"> {
  let parameters = (ins "int":$one, "int":$two, "int":$three);
  let mnemonic = "my_struct_type";
  let assemblyFormat = "`<` struct($two, $three) `,` $one `>`";
}

The above attribute will have the format !my_dialect.my_struct_type<two = 24, three = 16, 42>.

Using struct(*) will capture all the parameters in an attribute or type.

Parameter Parsing and Printing

The default parser for any parameter is

MyParamT myParam;
::mlir::ParseResult result = ::mlir::parseField(parser, myParam);

The default printer is

printer << getMyParam();

Both parseField and the stream operator can be overloaded to provide parsers and printers for custom types. One goal is to provide enough of these in MLIR core that must common types (integers, strings, etc.) won’t require work on the user’s part.

Just as non-POD parameters require defining an AttrParameter or TypeParameter to provide a custom allocator, they must also specify a cppStorageType; for StringRef it is std::string and for ArrayRef<T> it is SmallVector<T>.

For more complex parsing or printing behaviour, define parser and/or printer to override the default parser and printer.

def MyParam : AttrParameter<"MyParam"> {
  let parser = [{ parseMyParam($_parser, $_self, $_type) }];
  let printer = [{ printMyParam($_printer, $_self) }];
}

For attribute parameters, $_type refers to the expected type of the attribute.

Limitations

Parameter types whose cppStorageType does not have a default constructor cannot be parsed as part of an assembly format, because they are declared at the top of the function – notably, APFloat.

Example: Ditching StructAttr

StructAttr is convenient and concise, but it can be slow (attributes inside DictAttr).

def SlowAttr : StructAttr<"SlowAttr", MyDialect, [
                          StructFieldAttr<"count", I32Attr>,
                          StructFieldAttr<"dims", I32ArrayAttr>,
                          StructFieldAttr<"limits", I32ArrayAttr>,
                          StructFieldAttr<"map", AffineMapAttr>]>;

The above example can be replaced with an AttrDef now as simply:

def FasterAttr : AttrDef<"FasterAttr", MyDialect> {
  let parameters = (ins
    "int32_t":$count,
    ArrayRefParameter<"int32_t">:$dims,
    ArrayRefParameter<"int32_t">:$limits,
    "::mlir::AffineMap":$map
  );
  let mnemonic = "faster";
  let assemblyFormat = "`<` struct(*) `>`";
}

River707 · October 7, 2021, 10:35pm

Great proposal Jeff! I’m +1 on this, and like the way you’ve strucutured the parsing. We can keep all of the literal rules the same as the operation parser, but only provide a few key builtin directive. I don’t think it’s necessary for the initial proposal, but have you considered adding a custom directive (similarly to operations)? That would allow for some custom composition between parameters in the middle of a format.

Mogball:

For more complex parsing or printing behaviour, define parser and/or printer to override the default parser and printer.
def MyParam : AttrParameter<"MyParam"> {
  let parser = [{ parseMyParam($_parser, $_self, $_type) }];
  let printer = [{ printMyParam($_printer, $_self) }];
}
For attribute parameters, $_type refers to the expected type of the attribute.

Why do we need to pass in self here? What about having these return something like FailureOr<T>?

Have you considered using something like Optional<T> for the storage? That would let you declare the variable early and then parse later.

– River

Mogball · October 7, 2021, 11:21pm

Yes actually, but I thought it’d be best to leave that for later.

Heh, I’m exactly re-using the FormatLexer from OpFormatGen.

The current parsing “paradigm” is to return a ParseResult and pass in a reference to where the parse result ought to be stored. I didn’t feel like rocking the boat, and also this allows the right parseField to be selected by ADL.

Oh I think I wasn’t very clear here in describing the problem. The parser looks like this:

mlir::Attribute MyAttrOrType::parse(...) {
  int v0;
  int v1;
  llvm::APFloat v2; // error: llvm::APFloat() is private
  ...
  if (::parseAPFloat(parser, v2)) { ... }
  ...
}

mehdi_amini · October 7, 2021, 11:35pm

Thanks @Mogball: that’s a great evolution!

I thought about something else just yesterday after we discussed this: one “regression” from using a dictionary attribute is the introspection aspect.
I think what should be possible (and that would help Python bindings as well) is for all these attributes to implement (optionally maybe) a “dictionnary” interface that would allow to lookup fields by name.
I’m not super clear on this though, because we need some type erasure on the value for each field. One way is to wrap this in attribute but that seems costly (and we also now need to have a mechanism to “raise” each field value into its own attribute on demand).

River707 · October 7, 2021, 11:36pm

I don’t think we should stick with that in this case, as being able to construct these things (like APFloat) is useful. (Also ParseResult == LogicalResult, but with boolean conversion. Using FailureOr seems inline here).

Mogball:

Oh I think I wasn’t very clear here in describing the problem. The parser looks like this:
mlir::Attribute MyAttrOrType::parse(...) {
  int v0;
  int v1;
  llvm::APFloat v2; // error: llvm::APFloat() is private
  ...
  if (::parseAPFloat(parser, v2)) { ... }
  ...
}

I understood the problem, but you could do:

Optional<llvm::APFloat> v2;
if (::parseAPFloat(parser, v2))
  ...


ParseResult parseAPFloat(..., Optional<APFloat> &result) {
  if (...)
    return failure();
  result.emplace(APFloat(...));
  return success();
}

I’d much rather prefer using FailureOr though.

River707 · October 7, 2021, 11:38pm

I was also discussion something similar with someone today (in relation to PDL frontend). There is a lot of room for improvement in introspecting attributes/types. One possible convenience thing could be to auto-generate the SubElementInterfaces when possible.

– River

Mogball · October 8, 2021, 12:00am

Oh yeah that would work. I misunderstood then But I think (if things were kept the same) that the onus would be on the user to use Optional as a workaround. I can look into FailureOr, particularly how it would change how parseField is implemented.

If python bindings were autogenerated for ODS attributes and types (I think there was some discussion about that) then this would be moot.

But what about bit generateDictInterface which tries to generate something like

Attribute lookup(StringRef name) {
  if (name == "param1") return getParam1();
  if (name == "param2") return getParam2();
  ...
  return {};
}

Which would fail to compile if any of the attribute’s or type’s parameters weren’t attributes.

mehdi_amini · October 8, 2021, 3:44am

Generating the python is one thing, it is still quite a hurdle to generate the C API required by the binding. With a generic interface we could write the C binding once and then generate Python wrapper around it to make it nicer.

Yeah this is the kind of thing I had in mind. Note though that getParam1() does not return an Attribute right now.

Mogball · October 8, 2021, 3:58am

True. I was thinking if parameters were actually just attribute types:

def MyAttr : AttrDef<...> {
  let parameters = (ins "IntegerAttr":$param1, "FloatAttr":$param2);
}

Only if all the parameters were implicitly convertible to Attribute would it compile.

Mogball · October 8, 2021, 4:12am

What about mlir::asAttribute?

Attribute lookup(StringRef name) {
  if (name == "param1") return mlir::asAttribute(getParam1());
  if (name == "param2") return mlir::asAttribute(getParam2());
  ...
  return {};
}

Custom types can just be wrapped into an attribute shipped off to Python.
EDIT: nevermind. I forgot what Mehdi wrote

Mogball · October 8, 2021, 4:17pm

Okay so I’ve swapped to FailureOr<T>. Now the specialize-able parser looks like this:

template struct FieldParser<MyType> {
  static FailureOr<MyType> parse(DialectAsmParser &parser);
};

jpienaar · October 8, 2021, 4:17pm

DerivedAttributeOpInterface does something like this with materializeDerivedAttributes - downside is it materializes a DictionaryAttr indeed, and there we require that all values can be raised to an attribute.

clattner · October 8, 2021, 4:25pm

This is really really awesome Jeff, I’m thrilled you’re working on it. Beyond basic support, I wonder what you and others think about a pattern we’ve used a bit over in the CIRCT project. We have several dialects that need to replace effectively the whole type system, and provide structs, arrays, integers and other things with dialect specific types.

The problem with the straight-forward approach to implementing this is that you get massive boilerplate that makes reading these types super painful, some example types:

%thing: !firrtl.uint      // unknown width unsigned integer
%thing: !firrtl.uint<4>   // known width unsigned integer (same C++ class)
%thing: !firrtl.clock     // various domain specific thingies
%thing: !firrtl.vector<!firrtl.uint<1>, 12> // array-like type
// Nested structure thing, "member:", "clk:", and "rst:" are the field names.
%thing: !firrtl.bundle<member: !firrtl.bundle<0: !firrtl.bundle<clk: !firrtl.clock, rst: !firrtl.uint<1>>>>

Because the type system is closed, the FIRRTL dialect parser and printer for types elides the prefix in nested positions. The last two examples end up printing like this:

%thing: !firrtl.vector<uint<1>, 12> // uint implicitly is !firrtl.uint
// Many nested !firrtl's go away
%thing: !firrtl.bundle<member: bundle<0: bundle<clk: clock, rst: uint<1>>>>

Has anyone else done this? It seems directly related to the “allow op dialects to be implicit within a region” work people have been talking about on the op side of things.

I have a similar need on the attribute side of things for the module parameterization work I’m working on.

-Chris

Mogball · October 8, 2021, 4:54pm

I did something similar when I made the PDL prototype way long ago: pdl.operation(%0 : value), but the shortened type was dropped (which also simplified the parsers) in upstream: pdl.operation(%0: !pdl.value).

I didn’t add support for Optional attribute or type parameters (yet) but I don’t see why we couldn’t have a directive like

let assemblyFormat = "`<` dialect-type(firrtl, $elementType) `,` $size `>`"

Maybe dialect-type($elementType) would just default to the dialect of the current type/attribute.

mehdi_amini · October 8, 2021, 5:56pm

I was thinking about this last night, and maybe that for inter-op purpose, it would be good enough to have an “DictionnaryCompatibleInterface” of some sort that would have the generated Attribute have to expose a static failureOr<MyAttr> MyAttr::getFromDict(DictionnaryAttr); and DictionnaryAttr MyAttr::toDictionnaryAttr().
It wouldn’t be “cheap” but this is probably fine for the use cases at hand.

mehdi_amini · October 8, 2021, 6:26pm

We could extend the default dialect to work there potentially. But I also wanted to bring up something we’ve been doing in the MHLO dialect. We have this attribute: #mhlo.conv<[b, 0, 1, f]x[0, 1, i, o]->[b, 0, 1, f]> which we attach to a convolution operation.
However we tweaked the parsing printing of the convolution operation so that in context it appears like this: %0 = mhlo.convolution(%arg0, %arg1) dim_numbers = [b, 0, 1, f]x[0, 1, i, o]->[b, 0, 1, f], .... (note how all the #mhlo.conv<...> wrapper is gone).

This requires more boilerplate than I wanted though, I’d really like this behavior to be provided by the declarative format by default!

The boilerplate is here:
-parsing the attribute “standalone”: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/hlo/lib/Dialect/mhlo/IR/hlo_ops.cc#L4999-L5004

the routing called for the “inner” part of the attribute (in between < and >): https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/hlo/lib/Dialect/mhlo/IR/hlo_ops.cc#L4826 (also called from declarative assembly below)
the declarative assembly https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/hlo/include/mlir-hlo/Dialect/mhlo/IR/hlo_ops.td#L1439

I haven’t thought enough how it translates to types though!

Mogball · October 8, 2021, 6:42pm

It should be (relatively) straightforward to just keep adding directives for all your desired behaviour. What would be a challenge is to do it smartly (OpFormatGen would look up whether an attribute as an assemblyFormat and… do something).

_sean_silva · October 9, 2021, 3:08am

Thanks Jeff, super happy to see this happening

I have not yet gone the custom parsing route (we have fewer deeply nested types), but the Torch dialect has this same general type of closed type system and would benefit from something like this too.

I wonder if we can extend the “allow op dialects to be implicit within a region” to also change the default dialect for types as well?.

Mogball · October 9, 2021, 9:41pm

This sounds like a pretty reasonable way to implement implicit attribute or type dialects: when parsing an op of a certain dialect, assume any type or attribute without the dialect prefix is of the same dialect.

Mogball · October 12, 2021, 1:07am

Got a patch up at https://reviews.llvm.org/D111594.

Topic		Replies	Views
RFC: Declarative Op Assembly Format MLIR	14	2084	July 15, 2020
[RFC] Elide Type/Attribute prefixes when using Declarative Assembly MLIR	23	928	January 17, 2022
Type(...)-directive for multiple inputs MLIR	2	142	August 22, 2023
Question about parsing string followed by numbers assembly format MLIR	2	164	October 1, 2023
FirstAttrDerivedResultType + Attributes + types in ODS MLIR	2	565	May 16, 2020