RFC: Exposing type and attribute names in C++

The goal to this RFC is to expose the type and attribute names in C++ as methods in the AbstractType and AbstractAttribute classes, and keep a map of names to abstract types/attributes in the MLIRContext.

The accompanying PR is here: [mlir] Expose type and attribute names in the MLIRContext and abstract type/attr classes by math-fehr · Pull Request #72189 · llvm/llvm-project · GitHub

Motivation

While dialect type and attributes (those not in the builtin dialect) usually have the format !dialect.type<...>, the "dialect.type" string is not accessible from the Type or Attribute class, nor it is possible to get an abstract type or attribute from this string. While for most users this is not useful, it would be quite useful for PDL or IRDL.

For instance, this would allow us to specify the constraint “match any type cmath.complex” with only the PDL/IRDL dialect, without using additional C++. Currently, this is not possible, as there is no way to get the right AbstractType directly from anything else than an existing type or a TypeID. While I have the example of PDL/IRDL in mind, there are probably other areas (notably when doing meta things on MLIR) where this could be useful.

Proposed changes

There are two main changes I would like to introduce. First, adding in the MLIRContext a field that would associate type/attribute names to their AbstractType/AbstractAttribute. Second, adding a method getName to AbstractType and AbstractAttribute. These names should be in particular unique.

ODS-defined dialect types can directly emit these fields without any change, since most dialects use mnemonic to set a name to a type/attribute. For cases where types and dialects do not have a mnemonic, typeName or attrName can be overridden to give them a name to emit. For C+±defined types and attributes, only one new field is needed, getTypeName or getAttrName.

Proposed type/attribute names

For most dialects, the names would be straightforward, which is dialect.mnemonic. For builtin types and attributes, the name would be builtin.name, where name is the usual name (for instance vector). However, for the case of unranked vector and unranked memrefs, the names would be builtin.unranked_vector and builtin.unranked_memref, as names should be unique per type. The only other example I have in mind for this kind of change is in the quant dialect, which has two types that are parsed with quant.uniform, so one is named quant.uniform, and the other quant.uniform_per_axis.

5 Likes

+1! I think this is a great step forward for the infrastructure. It will enable greater reflection over MLIR both within the infrastructure (like ODS) and with tooling (like the Python bindings). A lot of the gruff around attributes and types are leftovers from the “dialect attribute” and “dialect type” days, which modern dialects have moved beyond.

1 Like

Another use case for this would be C API and bindings. It is currently impossible to create arbitrary attributes or types and one has to define new functions for each type. Type names only would not be sufficient for that as it would also require parameters, but it’s a start.

2 Likes

For the C API it seems one would need a generic create method really, so not sure if this helps as much (we could also generate many of the getters … but given the C API is supposed to not break as often, leaving that up to autogeneration would not be ideal). And one can do that today by way of the parse method if I’m not mistaken (it’s not nicely typed, and textual asm format not stable, so not ideal but feasible).

This feature seems useful for general inspection. Any significant impacts of this change? (E.g., affect context creation time? memory usage only additional hash map per context?)

The changes that will have an impact are:

  • Creating two empty StringMap in the MLIRContext.
  • Populating the StringMap when loading a dialect.
  • The additional getName constexpr / constexpr getter method in each Type/Attribute.
  • The additional getName in each AbstractType/AbstractAttribute.

I can try to measure these changes memory-wise and runtime-wise if you want. But I never did that before, do you have a pointer on what tool is best for measuring that correctly?

I don’t think the memory impact of StringMaps for types and attributes, of which we have O(100s) in non-extreme use cases, to be anyhow significant. Depending on how construction is organized, there could be a bit of a penalty due to map lookups. That could be microbenchmarked by running type/attribute creation in a loop with/without the change.