Python bindings + Properties

There was some discussion here about how properties might impact the Python bindings.
The wider issues might be better suited for this forum, as the PR seems to be independent of that question.

I’ve not been actively involved in the properties work, so I’ve been reading up.
Context I’ve been pointed to includes:

There’s specific mention of the bindings here.
I’ll try and keep the discussion here focused on the bindings, and not the wider properties discussion, though some things might overlap.

I’ll put forth a few assumptions I have:

  • The generic attributes dict is currently the main way that people are accessing both discardable attributes and inherent properties.
  • There is a getDiscardableAttrDictionary() method on Operation that returns a dictionary of discardable attributes.
  • There is a getInherentAttr() method on Operation that returns a given inherent attributes. At time of writing this is fetched by a StringRef name.
  • The method getAttrDictionary() would return a union of both discardable and inherent attributes.
  • I view that a property will have a name, and it would be an MLIR Attribute type.

Currently, the Python bindings expose the generic attributes dict via op.attributes, which has both discardable attrs and inherent properties.

Some questions (will raise more as I think of them):

  1. Will a property dict always exist?

Almost: inherent properties aren’t necessarily a dictionary, it can be anything: it’s “raw C++” and may not offer a key-value pair easy access.

I rather see the bindings reflecting the split that was introduced two years ago in the core C++ APIs when accessing discardable attributes separately from inherent ones.

The Python bindings use the C API so I don’t see how we can support arbitrary “raw C++”.
[mlir][python] add dict-style to IR attributes by Wheest · Pull Request #163200 · llvm/llvm-project · GitHub

These three comments suggest there could be a problem or ambiguity with the inherent properties?

Discussion on this discourse commnet suggests that properties might not always be representable as a dictionary.

Does that mean that the assumption that there will always be a name and an attribute type might not hold?

  1. It could be in future that the getAttrDictionary() method would be deprecated?

If that happens, the Python bindings could kick the can down the road by generating the union itself?

In the near-term, exposing getDiscardableAttrDictionary() might be a way to start that transition, e.g., with op.discardable_attributes.

If properties are guaranteed to be MLIR Attribute types, and have a name, then we could also expose op.inherent_properties as a dict-like object too.

  1. Properties might not always use the limited set of attribute types that attr-dicts currently

inherent properties aren’t necessarily a dictionary, it can be anything: it’s “raw C++” and may not offer a key-value pair easy access.

Also, this issue mentions C-bindings generated from ODS for properties.

In this case, how would a generic form op work? If I’ve got some unregistered dialect, but its properties are using something not representable using one of the attribute types?

Alternatively, for a performance cost, we could expose setPropertyFromAttr() and the like and have the Python and C users just construct the attributes corresponding to non-attribute properties.

From the Python side, I don’t think performance is of high concern. I’m unsure what other users of the C bindings are.

1 Like

I can comment on a few of these. But also, if you’re going to LLVM dev, check @krzysz00’s talk, he’ll be talking about properties and it might help to understand more things, and he has been doing a lot of awesome work with props.

First let’s start with, a property is not an attribute, both at the conceptual, and at the impl level. At the implementation level, a property can be any C++ object that implements certain hooks (there are also size requirements but lets omits those). Further, it’s not tied to the context nor subject to get uniqued.

At the conceptual level, properties are simply internal artifacts of an operation. The linked ODM, was precisely motivated to clarify the semantics of props. However, since then we haven’t had the time to action on that.

That’s not the case for most C++ code. Inherent attributes are usually accessed directly, for example a dyn_cast<arith::AddIOp>(op).getOverflowFlags() never goes through a a dict. In general, it’s not a good idea to use getInherentAttr, as it comes with perf penalty.

Also, discardable attributes should be accessed through the dedicated methods, and not generic methods like getAttr. See the comment in MLIR: mlir::Operation Class Reference

The assumptions are correct, however, both methods should come with a heavier disclaimer that they should be avoided.

Not true in general. Specially the fact it’s an attribute it’s not true at all, which is why there’s issues with bindings. In the ODM we discussed solutions to these, but again we haven’t had time to act on those. Happy to discuss if you want to send patches.

Yes.

This is a good idea.

Unregistered ops don’t use props like that. Unregistered ops will potentially use a single attr to represent things. Which is I believe one of the reasons props have to be currently convertible to attrs (something I want to remove, but I haven’t had the time).

1 Like

This quote makes me think that there is still some confusion on two aspects that are mixed up, so it is worth reiterating that there are two distinct topics here (I tried to explain in the review):

  1. Inherent and discardable attributes are separated: this is orthogonal to properties, we need to stop exposing a combined view of the two “namespaces” together. Potentially there could be collision for example (the same key present in both, and combining is not capable of handling this).

  2. Native properties which consists in “data that isn’t stored as an Attribute type attached on an operation”.

Worth mentioning that getInherentAttr does not go through a dictionary either, but it involves string comparisons, here is an example ODS-generated implementation:

std::optional<mlir::Attribute> StoreOp::getInherentAttr(::mlir::MLIRContext *ctx, const Properties &prop, llvm::StringRef name) {
    if (name == "alignment")
      return prop.alignment;

    if (name == "nontemporal")
      return prop.nontemporal;
  return std::nullopt;
}

If we consider both “properties” and “attribute” to be “data attached to operations”, then we can think of “attribute” as a subset of properties (we may want to introduce a class hierarchy in the future?) with these added aspects:

  • Attributes are Immutable
  • Attributes are uniqued in the context, which implies
    • must be hashable
    • storage/lifetime is tied to the context (for a property it is tied to the operation)
  • Attributes can have interfaces attached (we can add this for properties, but it’s not there right now.
  • Attributes can be parsed from a standalone string, right now properties are parsed in the context of an operation (we may add this capability to properties though).
  • Attributes supports isa/dyn_cast from the parent “Attribute” class.

From a binding point of view, accessing the information stored in a property or in an attribute does not seem any different to me: someone needs to plumb it!
The things that are trickier may the lifetime, but also possibly the isa/dyn_cast aspect (we may want to have some sort of typeid for “boxed properties” for the bindings, and in general to dynamically check if a property is an attribute or not).

+1 on this, this is a good first step.

1 Like

Having a bit more a read through the RFC, there’s a few quotes I’ll pull out, that may or may not be the current plan-of-record.

The generic form of an op with properties:

And

A little unclear to me is if each of the properties can separately be an Attribute, or if the entire properties struct can be represented as a single Attribute?

E.g., does test.with_properties get init’d with a single Attribute that contains all the properties, or does it get init’d with separate Attributes for a, array, and b?

The code I’m seeing appears to sugggest the former, along with this commnet.

I’m curious how we would represent properties in the generic textual form in this case? Dialect storage?

1 Like

In general I would exercise caution when using a 2 year old post for getting the accurate status of the current infra.

While not an official term, lets call ‘native’ properties, props that are not attributes.

Currently, each ‘native’ property has the requirement to be convertible to, and from attributes . For example, see the methods: MLIR: lib/IR/ODSSupport.cpp Source File, MLIR: lib/IR/ODSSupport.cpp Source File to convert a string property to and from a StringAttr.

So currently yes, the entire property struct of an operation is currently convertible to a single attr (usually a dict), see this method: MLIR: mlir::OperationName Class Reference, and each individual prop is also convertible to attrs.

Now, in the linked ODM, I argued for eventually dropping that requirement, as I believe it was added to address a limitation (generic printing and parsing…), and not as a feature. That’s also why relying on that conversion for exposing features is not the best idea, as it was never meant to be widely used outside the limited scope it was conceived.

The easiest workaround, is to give them registered textual format in the context. This also allows to drop the conversion requirement to attributes. This means that something like an i32 property could be parsed and printed to something like &i32<0>.

In concrete terms, when the parser encounters a & it would know it’s parsing a property (similar to !, or # for types and attrs). Then it would parse i32, search the registered hooks for the corresponding parsing hook, and upon finding it, it would parse and return it as an opaque property with storage.

In the ODM, I also floated the idea of dropping support for parsing the generic format (printing is still useful for debugging). But I recognize, this last point would require more justification, and has a high bar to be accepted as a change.

I don’t have a full understanding of the design of properties, so the following points are just questions based on my limited interpretation of the surrounding context — please feel free to correct me if I’m mistaken.

It seems that the ability to convert between properties and attributes naturally gives properties some form of dynamic reflection. If we drop this requirement, should we consider providing reflection capabilities for properties through other means? (Given that properties can be arbitrary C++ objects.)

From the perspective of bindings, we can currently construct arbitrary operations through just a few APIs (for example, Operation::create; in the Python bindings, functions like arith.addi(..) are merely thin wrappers around Operation.create).

If we don’t provide reflection capabilities for properties, would that mean we need to add a dedicated setXXXProperty API for every property of every operation? While this approach is certainly possible, it raises some potential concerns — such as the rapid increase in the number of C APIs, as well as how to design a more dynamic and user-friendly API in dynamic languages like Python. In addition, we might also need APIs like getXXXProperty. And if stronger dynamism is desired, we might even need a way to dynamically map properties to and from a dictionary.

Overall, it seems to me that this issue goes beyond just handling “generic printing and parsing.” It seems a broader question: when users need properties to offer a certain degree of dynamic reflection, what should we do?

This is part of the problem I wanted to solve. Currently, there’s no way to set or get ‘native’ properties in python (not even through attributes), they are completely unreachable in python code (not even during op creation). Thus, the issue is disconnected from whether they should have conversions to attributes or not.

There might be a couple of alternatives on how to solve the issue you’re referring. I currently lean towards adding something like:

  • OpaqueProp Operation::getOpaqueProp(StringRef propName)
  • void Operation::setOpaqueProp(StringRef propName, OpaqueProp prop)

methods, with a heavy disclaimer that the methods are not supposed to be used in C++, and are there for bindings, or very qualified cases.

Then, we don’t need to significantly increase the footprint of the C API by adding many individual methods, as you can use the above generic methods. However, you do still need to have printing and parsing of properties to create properties in python (this is similar to what we do with attributes, either we parse them in python, or we have to create a C-API for them).

These two goals are in direct contradiction aren’t they?

They are not exactly in contradiction. However, I see how it might look that way, and I muddled the conversation a bit by bringing that idea.

Maybe more context is due. Dropping generic op parsing started as a way to remove the requirement for attr conversion without thinking about the bindings. Then after thinking about it a bit more, to me generic op parsing feels like an artifact of the past. For example, in upstream most generic ops are found in tests as unregistered ops, or in some very old tests. However, since the creation of the test dialect it has been generally discouraged to use unregistered ops in tests, and we don’t add ops in generic format that frequently anymore, so I started to float the idea that perhaps is time drop support (I muddled the conversation there as well).

On the other side, parsing/printing of properties is a different way to drop requirement of conversion to attributes while keeping generic parsing. But also, it’s also a likely must to bring support of properties to python without an explosion in the C-API.

TL;DR; It’s possible to drop generic parsing of operations, and add registered parsers/printers for properties.

1 Like

What do you mean by “dynamic reflection” on Attribute? An Attribute class is already “opaque” until someone writes a binding for it.

Maybe this should be something that should be a wider RFC. I don’t think this has consensus everywhere. For example, one interesting thing you can do with generic parsing/printing (and torch-mlir uses this) is having a stable dialect definition which is not intended to change and the python bindings directly generate the generic operation definition. This way, your python frontend doesn’t have to rely on packaging the stable dialect, as long as your compiler taking it as input can handle it.

I’m sure you can argue against doing this, but I’m more trying to raise the point that you shouldn’t assume that just because upstream isn’t using generic printing/parsing more widely, other projects don’t use it.

2 Likes

Just in case, this is not something I’m currently proposing. It’s something I started thinking about, and floating as a future possibility to gather thoughts from other folks.

I also mentioned, that I recognize such change would have to overcome an extremely high bar to be accepted as a change.

But would love to talk more with other people to hear what are their use cases and if the change is justified or not. Thanks for mentioning torch mlir.

Ah, yes, I see what you mean. When we want to access a particular “field” inside an attribute type, we still need to have bindings for each of those fields. However, I think Attribute still possesses a certain degree of reflection capability. Here are a few phenomena that, in my view, contribute to that:

  • Even without any bindings, we can still construct attribute objects in arbitrary types through parse. In fact, sometimes we make use of this capability in the Python bindings to avoid adding many C APIs. (I’m not sure if this counts as an anti-pattern, but it’s certainly convenient—at least when performance isn’t a concern.)
  • Attribute has a type hierarchy, so we can perform membership tests using LLVM-style RTTI.
  • We can traverse nested attributes within any Attribute via the walk method, whereas with something like std::vector<std::string> we might need extra work to achieve the same effect.
  • It also seems that through AttributeInterface, we can add some dynamically polymorphic behavior to attributes? (I’m not familiar with this part)
1 Like

So, sprawling out from the original question, I do think that the ability to add interfaces, the fact that you can do those sorts of nested structure traversal, and even the “it’s immutable and lives in a context” thing are reasons to keep attributes around even if we end up moving a lot of inherent attributes to non-attribute properties - if nothing else, the discradable attributes are map<[string], [arbitrary data]>, where said arbitrary data should probably live at least long as the op … and we have attributes for that sort of thing.

Now, as to the topic

  1. Yeah, we should be removing all the calls to stuff that merges inherent attributes / discardable attributes and deprecate that sort of thing. I haven’t had the time to make progress on this.
  2. Adding a discardable attributes dict to the Python makes sense
  3. I do think some sort of registered parser/printer for properties isn’t a bad idea - if nothing else, it’d avoid a bunch of code duplication. But also … I don’t think I’m as concerned about the C API explosion thing? We can tablegen C wrappers around properties structs, no?
1 Like