Retain original identifier names for debugging

Repeating some of the discussion in the PR, there are three approaches that I’ve been able to condense from discussion with the community and contributors.

A general goal of any solution should be to reduce the impact on the codebase when this flag isn’t enabled.

You can’t have both 1) low impact on existing clients, and 2) any guarantees about maintenance of this information. These are strictly contradictory.

I think that 1) is a more important goal than 2), since ultimately this is just about giving debug hints to developers.

0. Attribute-based approach

We store our names as discardable attributes within operations. We have an implementation of this, see above and the PR for notes on its design.

Pros

  • Uses existing infrasuture within MLIR (attributes)
  • Storage, retrieval, and transfer of data between parsing and printing is quite simple (since we use attributes)

Cons

  • It adds additional data to the IR. The pass inserts the data, and then cleans up after itself. However, there is still a risk that some intermediate dialect drops the information, or changes its behavior. However, these issues will only occur if the pass is enabled, which reduces the impact.

1. Location-based approach

Operations include location info which points to their original definition in the source. We could reparse this at print time, and recover original names from this.

Pros

  • Uses existing infrasuture within MLIR (location info)
  • Storage and transfer of data between parsing and printing is quite simple (since we use location info)
  • Would not require changes to the IR, uses information already present

Cons

  • Data retrieval would be tricky: it would require re-parsing, which could add a lot more complexity, e.g., how do we reassociate a newly parsed Value and its name with the original Value, and resolve ambiguity after transformations? This might require adding extra data, which would introduce overhead
  • There are other clients of location info, for whom this feature could cause problems.

2. Use of external data structure

We could create an external data structure, e.g., a DenseMap<Value, StringRef>. We have an initial implementation of this.

Pros

  • Centralizes the storage of our name information, is conceptually quite simple (a dict).
  • Storage and retrieval is quite simple, since it is a dict

Cons

  • Transfer of data between parsing and printing is quite complex, who owns the data, and how does it get from the parser to the printer?
  • My implementaion of this requires a lot of optional nullptr’d arguments to pass our data structure to the parser and printer properly. This reduces code cleanness. There could be some way to pack this data in some extisting data structure (e.g., with AsmState+AsmParserState), but I think this would still require adding/changing methods to access and transform this data.
  • The overhead is arguably higher than the attribute-based approach, as several methods now need handle store a pointer to our data structure, even if it is not being used.

If we use the requirement:

“least-invasive approach even if flawed”

I would say “0. Attribute-based approach” is probably this. 1. requires solving a more complex problem of Value re-association, and 2. requires the insertion of a lot of optional arguments to parsing and printing functions.

If keeping the IR clean is considered to be of greater importance than keeping the code clean, then 1. or 2. might be more attractive. However, for 0. inserting additional information into the IR only occurs when a user enables this flag, meaning that it will not change the IR unless the user requests it. I could also add another flag, e.g., --no-attr-dict-cleanup, which would print the attributes introduced by this pass.