In the python bindings, we have generally overloaded the __str__
method to return the corresponding MLIR assembly of an object. Example:
module = ctx.parse_module("...")
print(module) # Calls __str__ and prints assembly
attr = ctx.parse_attr('"mystring"')
print(attr) # Calls __str__ and prints the assembly form "mystring"
In trying to use the API, I’ve found this more awkward than intended and propose that we avoid defining the __str__
method unless if we are intending that the instance represents something that should be considered a Python str
. I propose that instead we:
- Define the
__repr__
method to print a debug-friendly decorated ASM form for “small” objects likeType
andAttribute
(i.e. so thatrepr(my_tensor_type)
prints a debug friendly string likeType(tensor<...>)
). - Define the
__repr__
method for “large” objects like Operation and Module to print a debug summary such asModule(<id>)
orOperation(mydialect.my_op)
for debugging what the object is without getting into trouble of accidentally dumping megabytes of text for logs and prints. - Define a
to_asm
method on any ASM-printable object. By default, it would print the ASM with default options. We could extend it with anAsmPrintOptions
optional argument at a later date.
This frees up the __str__
method for what I believe to be a better aligned purpose: casting things that should be treated as strings according to the rules Python has for such things. This will be primarily helpful with attributes.
Consider that currently, extracting the Pythong str
value from a StringAttribute
looks like this:
sattr = ctx.parse_attr('"mystring"')
sval = StringAttr(sattr).value
I would like to define the __str__
method on the generic Attribute
base class so that it implies a “cast” through StringAttr
:
sval = str(sattr) # Returns the str containing text "mystring" (no quotes)
print(sattr) # Prints "mystring" (no quotes)
I would make this strict so that it only works if the attribute is a StringAttr
. This would throw an exception:
iattr = ctx.parse_attr(1)
sval = str(iattr) # Raises ValueError: attribute cannot be implicitly converted to a str
We could then also define __int__
and __float__
to similar effect for IntAttr
, FloatAttr
, and IndexAttr
access:
iattr = ctx.parse_attr(1)
i = int(iattr) # i = 1
s = str(iattr) # Raises ValueError: attribute cannot be implicitly converted to a str
The last point does create a speed-bump because it breaks the Python notion that things that are “int-ey” or “float-ey” should be convertible to a str
via a call to str(foo)
. We can either accept this (and be more strict/correct) or special case the __str__
attribute conversion to follow the python rules by attempting to downcast to an IntAttr
or FloatAttr
or StringAttr
and doing the conversion. If we do diverge, it means that printing and logging objects of unknown providence needs to be done this way to generically print (which is arguably the most right anyway, since presuming that things are str convertible is trouble – but many will do it anyway):
myattr = ... # Some attribute of unknown providence
print(repr(myattr))
print("some format message: {!r}".format(myattr))
print("some format %r" % (myattr,))
print(logging.info("some message: %r", myattr)
I think I prefer this approach because it gives us some nice, pythonic shorthands for accessing values, preserves the MLIR type hierarchy, and gives us a more full featured way to access the assembly form.
Preferences?