[RFC] Elide Type/Attribute prefixes when using Declarative Assembly

I’ve been playing with a change in declarative assembly format handling of types/attributes.
The idea is to take advantage of cases where the parsing context enables to know what instance we’re parsing.

For example this test op:

def FormatCompoundAttr : TEST_Op<"format_compound_attr"> {
  let arguments = (ins CompoundAttrA:$compound);
  let assemblyFormat = "$compound attr-dict-with-keyword";
}

At the moment we require:

test.format_compound_attr #test.cmpnd_a<1, !test.smpla, [5, 6]>

but we could round-trip it unambiguously as:

test.format_compound_attr <1, !test.smpla, [5, 6]>

Since we already know exactly what attribute we’re parsing/printing here.

The same principle applies in nested Type/Attribute:

def CompoundNestedInnerType : Test_Type<"CompoundNestedInner"> {
  let mnemonic = "cmpnd_inner";
  let parameters = (ins
    "int":$some_int,
  );
  let assemblyFormat = "`<` $some_int `>`";
}
def CompoundNestedOuterType : Test_Type<"CompoundNestedOuter"> {
  let mnemonic = "cmpnd_nested_outer";
  let parameters = (ins  CompoundNestedInnerType:$inner);
  let assemblyFormat = "`<` `i`  $inner `>`";
}

The current generated format right now is:

#test.cmpnd_nested_outer<i #test.cmpnd_nested_inner<42>>

but instead we could round-trip it as:

#test.cmpnd_nested_outer<i <42>>

Since we already expect to parse a #test.cmpnd_nested_inner after the i, it does not add any information here.

I implemented all of this in: https://reviews.llvm.org/D113873

3 Likes

This SGTM as long as we ensure that the fully specified form is still valid, which it looks like your patch does. I also assume that if a type allowed either an i32 or a mydialect.i32 that this would require the fully-specified form? Is it only allowing eliding if there’s exactly one type that can be used? Only one dialect? The types are spelled differently even without the dialect prefix? Only if without the dialect prefix the type doesn’t conflict with any builtin type? I’d be a bit worried about something that took either an i32 or a mydialect.i8 and spelled them i32 and i8

Yes.
You can think about the existing generated accessor on the C++ class. If the accessor for an attribute returns Attribute we will use the generic printer/parser. If it returns a specific subclass (and this subclass implements print()/parse() methods), then we’ll use the short version.

This does not provide for a general dialect prefix elision like requested here: Eliding dialect prefix for types ; I suspect it is fairly orthogonal actually.

That seems like a great conservative first step :+1:

+1. In Mehdi’s examples, he kept the <...>. But, in fact, if the dialect prefixes are being stripped, these braces can be dropped from the format entirely.

E.g. let assemblyFormat = "$some_int" would result in

#test.cmpnd_nested_outer<i 42>

Of course, the ugly form would be #test<"cmpnd_nested_inner 42">

Removing the delimiters (</>) is another can of worms right now, and it can’t be done universally because for example some attributes allow an empty value and the parser in the attribute (or type) is rely on things like while (!parser.parseOptionalGreater()) { // continue to parse until closing >``

I’m not referring to removing the delimiters generically from within the AsmParser/Printer. I’m saying that attribute parsers that are implemented as

if (parseLess()) return failure();
// parse all the stuff
if (parseGreater()) return failure();

The calls to parseLess and parseGreater can be dropped. I.e. "$value" instead of

"`<` $value `>`"

You’d think so, but right now the enclosing framework needs a delimiter to extract the string, even when using the “ugly” form. Even if we were to allow this that wouldn’t work for the pretty form, like this: #test.cmpnd_nested_inner 42

I know. I noted that the ugly form of attributes would still need the enclosing braces like #test<"cmpnd_nested_inner 42">, but I’m talking about when the parser/printer is called from within the custom parser/printer of an op or attribute.

Like in your example,

Dropping the braces (of the inner attribute) in both cases would be

#test.cmpnd_nested_outer<i #test<"cmpnd_nested_inner 42">>
// versus
#test.cmpnd_nested_outer<i 42>

General ping on this RFC, if there aren’t any other comment I’ll likely move forward with it this week.

This works on some examples, but it is hard to generalize from an API point of view right now because of how everything is set up. We have multiple printer/parser for attributes which rely on having the > (or EOF) to detect the end of their parsing. I’m not sure how to structure it all for it to work.

Two quick questions here:

  1. If you have #test.cmpnd_nested_outer<#test.i32<42> and this enables #test.cmpnd_nested_outer<i32<42>, in which direction will the parser resolve the inner one? (I assume the context-specific dialect one as opposed to the one from the builtin type here.)
  2. Would the general form (-mlir-print-op-generic) print the whole fully qualified type?

Given (1) and (2), would those who aren’t already familiar with the op’s definition/spec have to use -mlir-print-op-generic to know what type exactly is being used inside? (when you have colliding names which may happen often?)

That would enable #test.cmpnd_nested_outer<<42>> actually.

Are you asking if it’ll be parsed back as #test.cmpnd_nested_outer<#test.i32<42>? Then yes.

Right now it does not, I need to check if we have access to this flag ; I think I looked into it before and it isn’t trivial because this is all coming from the OpPrintingFlags which historically are only plumbed through the OpAsmPrinter. The DialectPrinter never had these, and it is in control of printing the types and attributes.

I think you’re asking because you thought that the i32 would be printed (right?), but it won’t.

Oh, it elides the whole thing. I was thinking it would elide just the dialect prefix. For eg., if we had #test.i32<42>, #test.i64<42>, you’d lose information if you elided the whole thing – isn’t there a use case to just drop the dialect prefix?

Yes. More directly, would it be treated as the builtin type i32 or the dialect-specific one when parsed – it’s the latter that you mention.

But being able to do this would appear to be a pre-requisite. The op-generic form shouldn’t be hiding anything inside of a custom user-friendly syntax which is what this proposal is about. Doesn’t the earlier feature on eliding the op’s dialect prefix while printing its name (when immediately enclosed by a region-holding op of that dialect) apply only to the custom form? If the dialect name prefix always appears in the generic form, it would be inconsistent as well to not have such behavior for attributes and types.

That’s right - if there isn’t ambiguity, this part is moot.

I don’t think there is an information loss here, at least how I implemented it can always round-trip without more ambiguity than we already have.
If you’re worried about readability, I see it as the responsibility of whoever is using the assembly format: i.e. you can write your assembly format to print and parse as: #test.cmpnd_nested_outer<i32<42>> if you want, you have to write somehow like this: let assemblyFormat = "i32 $inner"

I probably should clarify a few things, because this proposal does not make anything really worse on this aspect. One quick way to convince oneself about it is that I’m only changing the behavior of the declarative assembly format, not changing anything that has already been the behavior of manually written custom printer/parser.

First it is important to keep in mind that there isn’t any generic format for attributes and type: it just isn’t something that ever existed. Until a couple of weeks ago we didn’t have declarative format for types and attribute so they were written in C++, and you don’t have access to the flag.
This is actually an issue the I encounter in MHLO recently as I converted a few attributes from using DictionaryAttribute to be custom. Writing the pretty printers for some of them, I wanted to write a “raw form” by hand, but we don’t have access to the flag to trigger on it.

Back to what I wrote here, it kicks-in in two independent “contexts”: the “op custom assembly format” and the “attribute format” (that is, the part in between < and > only).

Let’s look at the first context, it will honor the “op-generic” flag and the short form will be disabled to print the generic op as expected, but again, this is orthogonal from how the attributes themselves are printed (which is the second “context”).

To take the example from the RFC, this custom op:

test.format_compound_attr #test.cmpnd_a<1, !test.smpla, [5, 6]>

could be printed (after this change) as:

test.format_compound_attr <1, !test.smpla, [5, 6]>

But you can add keywords as I showed with the i32 before:

test.format_compound_attr compound<1, !test.smpla, [5, 6]>

This is entirely under the control of the person who write the assembly format for the op.

The generic printer for this op would be entirely unchanged before and after, it would look like:

"test.format_compound_attr"() attributes { compound = #test.cmpnd_a<1, !test.smpla, [5, 6]> } 

Now, independently from the op assembly format, let’s look at the second context.

We have the question of how to write the printer/parser for the attribute itself (again, the part in between < and > only, never affected by the op-generic flag).
Assuming you write in C++ (the only way until very recently), you have always been able to write it like this:

MyAttr::print(OpPrinter &p) {
  p << "<";
  p.printAttribute(innerAttr());
  p << ">";
}

or:

MyAttr::print(OpPrinter &p) {
  p << "<i32";
  innerAttr()::print(p);
  p << ">";
}

In the first case, the dialect and mnemonic are printed, in the second case they aren’t.
The first case would print:

#test.cmpnd_nested_outer<#test.i32<42>>

The second would print:

#test.cmpnd_nested_outer<i32<42>>

Again, this is the existing APIs and behavior, the is nothing new here.

What I’m changing is exclusively the behavior of the declarative assembly (which was introduced a few weeks ago, so it has little impact on existing code) to be able to write the second printer I showed in C++ above.

I think Mehdi makes a compelling case that this doesn’t make the situation any worse, but I think that Uday’s right that this is confusing. It seems like we should actually define generic format for printing types and attributes that at least always prints the dialect prefix

1 Like

Thank you for this level of detail! Sounds fine as this isn’t making the situation any worse. It was just a question I had as to whether the full name appeared in the generic form but looks like that issue is orthogonal to this proposal.

Resuming the RFC with a follow-up, as folks reported some pain with the type($operand) printer being elided.

In particular with an assembly syntax like let assemblyFormat = "$self attr-dict : type($self) -> type($result)";
You may go from %0 = torch.aten.size %arg0 : !torch.vtensor<[?,3],f32> -> !torch.list<!torch.int> to %0 = torch.aten.size %arg0 : !torch.vtensor<[?,3],f32> -> <!torch.int>

The syntax could be updated with a keyword, but you lose the ! ; so let assemblyFormat = "$self attr-dict : type($self) -> torch.list `` type($result)"; would yield:

%0 = torch.aten.size %arg0 : !torch.vtensor<[?,3],f32> -> torch.list<!torch.int>

So stepping back on the current state and how we got there: I started with the AttrDef for types and attributes, that means not the declarative assembly for the op but the declarative assembly for defining the syntax of an attribute itself.
The example sin the patch illustrates it, for example the !test.cmpnd_nested_outer type contains a !test.cmpnd_inner type which contains an int and a !test.cmpnd_a type, which itself contains an int, a type, and an array of ints.
The declarative assembly from inner to outer:

  • let assemblyFormat = "<inner $inner>";
  • let assemblyFormat = "<$some_int with cmpd: $cmpdA>";
  • let assemblyFormat = "<$some_int, $some_type, $array>";

And it would print before as: !test.cmpnd_nested_outer<inner: !test.cmpnd_inner<42 with cmpd: !test.cmpnd_a<3, i64, [5, 6]>>>
After as: !test.cmpnd_nested_outer<inner <42 with cmpd: <3, i64, [5, 6]>>>

Basically, the main notion introduced is one of context. In the context of parsing the outer type, we know what to expected in the nesting and we can elide the redundant information. Keywords can be added for readability as needed. Before that you’d have to resort to C++ instead of declarative ASM to be able to do this.
The same concept applied equally for attributes of course, replace ! with # in the example and you get the same result.

Now to the op assembly format. If I have an op like mydialect.do_something %input across dimension #mydialect.dims<0, 1, 2>
I’d really want to print it instead mydialect.do_something %input across dimension <0, 1, 2>.
The declarative assembly will look like: let assemblyFormat = "$input across dimension $dims";

Now it gets more interesting with types in the context of an operation: it seems a bit different than attributes in that you need a directive to access them: %out = mydialect.transfer %input, %output from !mydialect.buffer<"gpu"> to !mydialect.buffer<"cpu">
would have let assemblyFormat = "$input from type($input) to type($output);
The explicit directive type(...) does not show up in any of the other case, which makes it a bit “special” maybe.
I’d still want to be able to print this as %out = mydialect.transfer %input, %output from <"gpu"> to <"cpu"> ; and so far I had assumed that %out = mydialect.transfer %input, %output from buffer<"gpu"> to buffer<"cpu"> (or from mydialect.buffer<"gpu"> to mydialect.buffer<"cpu">) would be good enough here (at least on the example I encountered when deploying the patch).

We can introduce a separate directive (qualified_type($operand) for example) to be able to get the fully qualified type in the op declarative assembly. But that won’t solve the case of the Type/Attribute declarative assembly (for nested types/attributes).

1 Like

I’m +1 on qualified_type as a thing that dialect authors can use if they want uniformity across their ops. As has been noted, this is showing up in “opset” dialects that can contain hundreds of (mostly autogenerated) ops where consistency between them is important. I do like the terse forms for some things, when you are already in the territory of “I need to intimately understand this op to use it” vs “I have a thousand things that all need to be consistent so I can understand them as a whole.”

I personally haven’t seen the corresponding case with types/attributes.

1 Like

qualified_type as a thing makes sense to me. I think it would also be good to directly support common patterns in ODS. We already have functional-type(, we could also have arrow-type( etc that takes two singular types and puts an arrow between them. That would make the tblgen files cleaner and also make it easier to do things like this in a more “it just works” way.

For example, we could write arrow-type($self, $result) instead of type($self) -> type($result)

+1 - I’ve often reached for arrow-type one but have not been in an interruptible state to think about it. I paged through some of the things I’ve written and can’t think of others off the top of my head. Did you have any others in mind?

Also, nit: we should probably spell this qualified-type (dash not underscore)