[RFC] Supporting aliases in cyclic types and attributes

TL;DR

Make parsing of cyclic references of aliases possible by having mutable attributes or types register themselves as soon as the immutable parts were parsed, and by skipping over a blocks of alias-definitions before then lazily parsing them.

Motivation

Whether it is supporting modelling a higher-level language, modelling complicated metadata, working on the SPIR-V dialect or trying to model DWARF debug info, one will often require mutable attributes or types to be able to represent cycles.

However, as of this writing, there are multiple ergonomic issues with using mutable attributes and types in MLIR. One of these issues, and the topic of this RFC, is using cyclic attributes or types in alias definitions.

Recap

Through the OpAsmDialectInterface’s getAlias methods, one can declare that an attribute or type should always be printed as an alias definition. Uses of the attribute or type are then replaced with references to the alias instead.
This is e.g. heavily used upstream for affine_maps:

#map = affine_map<(d0)[s0] -> (2, -d0 + s0)>
...
   %0 = affine.min #map(%arg)[%c0]

They are present as top-level elements in the syntax that can occur inbetween e.g. top-level operations. Using aliases has the potential to heavily reduce the parsing and printing time and memory of textual IR by removing redundancy. In an experiment in my downstream project, adding an alias definition to a very commonly used and deeply structured attribute reduced the time to run my testsuite from several minutes to just a few seconds. Furthermore, they improve the readability of the IR substantially in many cases.

Problem

Alias definitions are currently only defined after they have been fully parsed.
This makes syntax like the following:

!linked_list_node = !spirv.struct<"LinkedListNode", i32, ptr<!linked_list_node, Uniform>>;

fail with an error as !linked_list_node is not yet defined.

Furthermore, it is not enough to make !linked_list_node available inside of its body, as the cycle may be arbitrarily large:

!linked_list_node = !spirv.struct<"LinkedListNode", i32, !pointer>;
!pointer = !spriv.ptr<!linked_list_node, Uniform>;

Neither of these two types can be correctly parsed as of this writing.

(Note that I used the spriv dialect for illustrative purposes only as it is an upstream dialect with mutable types. The same issues apply with mutable attributes and any other dialect allowing recursive types.)

Proposal

This RFC proposes to fix this issue by making a change to the language reference and parser.

It first introduces the concept of an alias-def-block which is simply defined as a list of type alias definitions and attribute alias definitions respectively.
Unlike usually, within an alias-def-block, it is allowed to reference any other alias defined within the same block prior to their definitions.

When finished parsing an alias-def-blocks, a post-order walk is performed over the block that ignores already seen elements to parse all attributes or types in an order making it possible to resolve all alias definitions, except a back-edge to a cyclic type or attribute.

The missing piece to make a cyclic type or attribute work is to register it as soon as its immutable parts have been parsed. This is today already a de-facto requirement in today’s cyclic attributes and types.
A recent commit added the
tryStartCyclingParse method to AsmParser. This method can be reused to register an instantiation of a mutable attribute or type that is being parsed as an alias definition as soon as possible.

Implementation

The current plan to implement this RFC is split into three separate commits:

  • Adding an API to the parser to register a cyclic type or attribute (this has already
    landed [mlir] Add helper method to print and parse cyclic attributes and typ… · llvm/llvm-project@b121c26 · GitHub)
  • Adding the parsing logic for making the parsing of alias-def-blocks with the above semantics work. Note that this requires no changes to the printer.
  • Hooking up the registration of a mutable attribute or type to the alias symbol state, allowing referencing it as soon as its immutable parts have been parsed. At this point the printer can also be adjusted to substitute references to mutable attributes or types with a reference to its alias definition.

As of this writing, I have implemented the required logic for skipping over alias-definitions, getting the range in source code as a StringRef, by introducing a “syntax-only” mode to Parser.
This mode parses builtin attributes and types as usual except that no verification of the parsed data is performed. The parsing of dialect attributes and types is skipped entirely.

The not yet implemented part is the post-order walk of the attributes or types.
My current idea is to 1) implement on-demand parsing of an alias definition when first referenced and 2) iterate through all alias definitions and first parse mutables ones, breaking any cycles, followed by parsing all other yet-to-be-parsed attributes or types until the whole block has been parsed.
Alternatively, one could just parse them in any order, with the only con being that if an attribute or type participating in the cycle is parsed prior to the mutable one, it will be parsed twice.

Please feel free to leave any alternative ideas how to implement this while I prototype the above approach.

Alternative Approaches Considered

Prior to writing this RFC I have prototyped and thought about other approaches to solving this issue.
Before listing these, I consider the following advantages and disadvantages of this approach:

Advantages:

  • I believe the described semantics are rather intuitive only expanding semantics on a block of
    alias definitions.
  • Less disruptions for tools such as the LSP server. While these would have to be adjusted to allow code completion of an alias inside an alias-def-block, they will continue working as is without any changes with completions elsewhere.
  • No printer changes required.

Disadvantages:

  • Alias definitions to builtin types and attributes are roughly parsed twice.

Other approaches considered:

Fully lazy parsing and printing

This approach skips over all alias definitions and only parses them as soon as first referenced.

Advantages:

  • No printer changes

Disadvantages:

  • Unintuitive side effects such as:
#def = test.attr<#attr_ref> 

func.func @foo() {
   #def // illegal here.
}

#attr_ref = i32

func.func @foo() {
   #def // but suddenly legal here.
}
  • More disruptive for tooling like the MLIR LSP server
  • An attribute or type participating in the cycle that is parsed/referenced prior to the mutable attribute of the cycle will be parsed twice.

Suspend parsing of mutable attributes

This approach would modify tryStartCyclicParse to take a callable which would parse the mutable parts of an attribute or type. If parsed as an alias-definition, the callable would be stored and called later at the very end of the file to resume parsing.
As all other alias definitions have been parsed and the mutable attribute or typed registered, the cycle was broken and any alias references within the body of the mutable attribute or type can be resolved without issues.

Advantages:

  • Parsing “Just works”
  • No changes for LSP

Disadvantages:

  • The semantics of mutable attributes or types is very inconsistent with the rest of the language. It’d appear as if suddenly some random parts of an attribute or type is capable of referencing alias definitions appearing anywhere in the source file.
  • More intrusive change to the API
  • Requires printer changes. Since only mutable attributes or type parsing is suspended and capable of referencing alias definitions later in the file, it is necessary that mutable attribute or type alias definitions are printed before any attribute or type it references. This is currently not the case in the printer.

Return placeholder attributes or types

This approach would, similarly to how locations are parsed, return a placeholder opaque attribute or type for any alias references prior to the definition. At the end of the file, all placeholders are replaced by the actual attribute or type. This would require making the AttrTypeReplacer work with mutable attributes or types.

Advantages:

  • We probably want AttrTypeReplacer to work with mutable attributes or types at some point anyways.

Disadvantages:

  • Unlikely to work as the parse implementations of many dialects verify the parsed data and are almost certainly not able to handle a OpaqueAttr or OpaqueType being returned. Changing this would be a painful undertaking and incredibly unergonomic. I consider this a show-stopper.

Related and Future Work

[RFC] Always printing type aliases for certain types (CC @mshockwave) tried to tackle this problem before with the LLVM dialect’s struct types as motivation. While the use-case for LLVM needing this has become mood with opaque pointers (please use them), the motivation remains the same.

Furthermore, after this is implemented, one may consider making types not only “only printable as alias”, but also “only parseable as alias”. This would remove a lot of the logic in !spriv.struct and !llvm.struct required for error handling of parsing a mutable attribute or type multiple times.

3 Likes

I have posted the second PR part of this RFC: [mlir] Add concept of alias blocks by zero9178 · Pull Request #65503 · llvm/llvm-project · GitHub

It implements the “alias-block” concept, i.e. the lazy parsing of alias definitions in a block.

A last PR will hook this mechanism up to the tryStartCyclicParse mechanism introduced in the first PR, making it possible to parse cyclic references as well.

I have posted the third and final PR implementing this RFC: [mlir] Add support for parsing and printing cyclic aliases by zero9178 · Pull Request #66663 · llvm/llvm-project · GitHub. Due to the lack of stacked PRs right now, the second PR is part of that PR as well, so make sure to not view the very first commit in that branch