This thread is to discuss how we could enable a debug flag to preserve identifier names when generating IR. I do not believe that existing debug passes allow us to keep the names of identifiers.
Right now, the original names of identifiers is lost, and when printing IR, anonymous names are used. This is intended behaviour, and a sensible default, See LangRef.md - “Identifier names for values may be used in an MLIR text file but are not persisted as part of the IR - the printer will give them anonymous names like %42.”
Therefore, something like:
func.func @add_one(%my_input: f64) -> f64 {
%my_constant = arith.constant 1.00000e+00 : f64
%my_output = arith.addf %my_input, %my_constant : f64
return %my_output : f64
}
becomes
func.func @add_one(%arg0: f64) -> f64 {
%cst = arith.constant 1.000000e+00 : f64
%0 = arith.addf %arg0, %cst : f64
return %0 : f64
}
However there are situations where it could be useful to preserve names:
- debugging and development, where a developer has given meaningful variable names to their identifiers and wants to understand what has happened to them
- Code formatting. See this thread, where I am developing a code formatter for MLIR files. One of the requirements here is that the formatter does not change the names
We could add an option such as --debug-retain-identifier-names
, to allow this behavior.
What would be an elegant way to design this? What challenges are there? Are there any existing components that we can leverage to achieve this goal?
In my initial investigation, the original name is still available during parsing, but is lost when the Value
is initialised.
The generic names are generated at print time, e.g., in AliasInitializer::generateAlias
in IR/AsmPrinter.cpp
, or a more concerete implementation SSANameState::numberValuesInBlock
where we generate the %argX
names for block arguments.
I can imagine a relatively generic check in these printers, something like if keepNames == true
and nameIsAvailable(value) == true
then print the original identifier name. If not, following the existing execution path, otherwise print the custom/original name.
In terms of how to store the original names, there are two main approaches I can imagine. First, is extending the Value
class with the field and access methods.
Second, is having some external mapping, say llvm::DenseMap<Value, llvm::StringRef> valueToAlias
.
The first approach has the downside of increasing bloat to all Value objects, but in the second approach there is the question of where the mapping should be stored so that it is easily accessible. MLIRContext is the most global object I’m aware of that could fit this requirement.
Another question is how to handle the anonymous naming of newly introduced values. E.g., if we add a pass which transforms the code, we may create some new values. If a SSA would otherwise be %2
, but the previous SSAs have been given names, should this be %0
(the first anonymous SSA) or %2
(the third SSA)? We need to ensure that our generated names do not clash with named that the user has defined (since the user could specify names which could also be generated by our alias generators).
Any thoughts or comments?