TL;DR
Make parsing of cyclic references of aliases possible by having mutable attributes or types register themselves as soon as the immutable parts were parsed, and by skipping over a blocks of alias-definitions before then lazily parsing them.
Motivation
Whether it is supporting modelling a higher-level language, modelling complicated metadata, working on the SPIR-V dialect or trying to model DWARF debug info, one will often require mutable attributes or types to be able to represent cycles.
However, as of this writing, there are multiple ergonomic issues with using mutable attributes and types in MLIR. One of these issues, and the topic of this RFC, is using cyclic attributes or types in alias definitions.
Recap
Through the OpAsmDialectInterface
’s getAlias
methods, one can declare that an attribute or type should always be printed as an alias definition. Uses of the attribute or type are then replaced with references to the alias instead.
This is e.g. heavily used upstream for affine_map
s:
#map = affine_map<(d0)[s0] -> (2, -d0 + s0)>
...
%0 = affine.min #map(%arg)[%c0]
They are present as top-level elements in the syntax that can occur inbetween e.g. top-level operations. Using aliases has the potential to heavily reduce the parsing and printing time and memory of textual IR by removing redundancy. In an experiment in my downstream project, adding an alias definition to a very commonly used and deeply structured attribute reduced the time to run my testsuite from several minutes to just a few seconds. Furthermore, they improve the readability of the IR substantially in many cases.
Problem
Alias definitions are currently only defined after they have been fully parsed.
This makes syntax like the following:
!linked_list_node = !spirv.struct<"LinkedListNode", i32, ptr<!linked_list_node, Uniform>>;
fail with an error as !linked_list_node
is not yet defined.
Furthermore, it is not enough to make !linked_list_node
available inside of its body, as the cycle may be arbitrarily large:
!linked_list_node = !spirv.struct<"LinkedListNode", i32, !pointer>;
!pointer = !spriv.ptr<!linked_list_node, Uniform>;
Neither of these two types can be correctly parsed as of this writing.
(Note that I used the spriv
dialect for illustrative purposes only as it is an upstream dialect with mutable types. The same issues apply with mutable attributes and any other dialect allowing recursive types.)
Proposal
This RFC proposes to fix this issue by making a change to the language reference and parser.
It first introduces the concept of an alias-def-block
which is simply defined as a list of type alias definitions and attribute alias definitions respectively.
Unlike usually, within an alias-def-block
, it is allowed to reference any other alias defined within the same block prior to their definitions.
When finished parsing an alias-def-blocks
, a post-order walk is performed over the block that ignores already seen elements to parse all attributes or types in an order making it possible to resolve all alias definitions, except a back-edge to a cyclic type or attribute.
The missing piece to make a cyclic type or attribute work is to register it as soon as its immutable parts have been parsed. This is today already a de-facto requirement in today’s cyclic attributes and types.
A recent commit added the
tryStartCyclingParse
method to AsmParser
. This method can be reused to register an instantiation of a mutable attribute or type that is being parsed as an alias definition as soon as possible.
Implementation
The current plan to implement this RFC is split into three separate commits:
- Adding an API to the parser to register a cyclic type or attribute (this has already
landed [mlir] Add helper method to print and parse cyclic attributes and typ… · llvm/llvm-project@b121c26 · GitHub) - Adding the parsing logic for making the parsing of
alias-def-blocks
with the above semantics work. Note that this requires no changes to the printer. - Hooking up the registration of a mutable attribute or type to the alias symbol state, allowing referencing it as soon as its immutable parts have been parsed. At this point the printer can also be adjusted to substitute references to mutable attributes or types with a reference to its alias definition.
As of this writing, I have implemented the required logic for skipping over alias-definitions, getting the range in source code as a StringRef
, by introducing a “syntax-only” mode to Parser
.
This mode parses builtin attributes and types as usual except that no verification of the parsed data is performed. The parsing of dialect attributes and types is skipped entirely.
The not yet implemented part is the post-order walk of the attributes or types.
My current idea is to 1) implement on-demand parsing of an alias definition when first referenced and 2) iterate through all alias definitions and first parse mutables ones, breaking any cycles, followed by parsing all other yet-to-be-parsed attributes or types until the whole block has been parsed.
Alternatively, one could just parse them in any order, with the only con being that if an attribute or type participating in the cycle is parsed prior to the mutable one, it will be parsed twice.
Please feel free to leave any alternative ideas how to implement this while I prototype the above approach.
Alternative Approaches Considered
Prior to writing this RFC I have prototyped and thought about other approaches to solving this issue.
Before listing these, I consider the following advantages and disadvantages of this approach:
Advantages:
- I believe the described semantics are rather intuitive only expanding semantics on a block of
alias definitions. - Less disruptions for tools such as the LSP server. While these would have to be adjusted to allow code completion of an alias inside an
alias-def-block
, they will continue working as is without any changes with completions elsewhere. - No printer changes required.
Disadvantages:
- Alias definitions to builtin types and attributes are roughly parsed twice.
Other approaches considered:
Fully lazy parsing and printing
This approach skips over all alias definitions and only parses them as soon as first referenced.
Advantages:
- No printer changes
Disadvantages:
- Unintuitive side effects such as:
#def = test.attr<#attr_ref>
func.func @foo() {
#def // illegal here.
}
#attr_ref = i32
func.func @foo() {
#def // but suddenly legal here.
}
- More disruptive for tooling like the MLIR LSP server
- An attribute or type participating in the cycle that is parsed/referenced prior to the mutable attribute of the cycle will be parsed twice.
Suspend parsing of mutable attributes
This approach would modify tryStartCyclicParse
to take a callable which would parse the mutable parts of an attribute or type. If parsed as an alias-definition, the callable would be stored and called later at the very end of the file to resume parsing.
As all other alias definitions have been parsed and the mutable attribute or typed registered, the cycle was broken and any alias references within the body of the mutable attribute or type can be resolved without issues.
Advantages:
- Parsing “Just works”
- No changes for LSP
Disadvantages:
- The semantics of mutable attributes or types is very inconsistent with the rest of the language. It’d appear as if suddenly some random parts of an attribute or type is capable of referencing alias definitions appearing anywhere in the source file.
- More intrusive change to the API
- Requires printer changes. Since only mutable attributes or type parsing is suspended and capable of referencing alias definitions later in the file, it is necessary that mutable attribute or type alias definitions are printed before any attribute or type it references. This is currently not the case in the printer.
Return placeholder attributes or types
This approach would, similarly to how locations are parsed, return a placeholder opaque attribute or type for any alias references prior to the definition. At the end of the file, all placeholders are replaced by the actual attribute or type. This would require making the AttrTypeReplacer
work with mutable attributes or types.
Advantages:
- We probably want
AttrTypeReplacer
to work with mutable attributes or types at some point anyways.
Disadvantages:
- Unlikely to work as the parse implementations of many dialects verify the parsed data and are almost certainly not able to handle a
OpaqueAttr
orOpaqueType
being returned. Changing this would be a painful undertaking and incredibly unergonomic. I consider this a show-stopper.
Related and Future Work
[RFC] Always printing type aliases for certain types (CC @mshockwave) tried to tackle this problem before with the LLVM dialect’s struct types as motivation. While the use-case for LLVM needing this has become mood with opaque pointers (please use them), the motivation remains the same.
Furthermore, after this is implemented, one may consider making types not only “only printable as alias”, but also “only parseable as alias”. This would remove a lot of the logic in !spriv.struct
and !llvm.struct
required for error handling of parsing a mutable attribute or type multiple times.