RFC: Use Attributes to Model Distinct LLVM Metadata Nodes

At the moment, the LLVM dialect models access group and alias analysis metadata using operations nested in a global metadata operation:

llvm.metadata @metadata {
  llvm.access_group @group
  llvm.return
}

llvm.store %0, %ptr { access_groups = [@metadata::@group] }: i32, !llvm.ptr

The disadvantage of this solution is that a module pass is needed to update access group metadata for example during inlining (since adding / removing access groups in parallel is not possible). The LLVM dialect inliner thus currently bails out if a function contains access group metadata.

We propose to replace the access group operations by an attribute based metadata representation similar to the debug metadata representation:

#access_group = #llvm.access_group<id = 0>

llvm.store %0, %ptr { access_groups = [#access_group] }: i32, !llvm.ptr

The advantage of this representation is that the inliner can create new access groups in parallel. Additionally, the structured attribute also simplifies the attribute verification.

There is one problem to circumvent if we want to use attributes to model distinct metadata nodes though. As we want to have multiple instances of the same base attribute, we need to add a unique identifier to the attribute (id = 0 in the example), and more importantly, we need a way to generate these unique identifiers in a thread-safe manner.

We propose to generate the identifiers using a mutable helper attribute that contains a mutable counter. Whenever the inliner or the import need to generate a new distinct attribute, we mutate / increment the counter of the helper attribute to get the next unique identifer. The helper attribute itself also takes a base attribute as a parameter to ensure different attribute kinds use separate counters.

The following code sequence illustrates the generation of a distinct access group attribute:

auto baseAttr = AccessGroupAttr::get(context, 0);

// Instantiate a generator attribute for the access group with id 0.
auto genAttr = GeneratorAttr::get(baseAttr);

// Query the generator for the next id and use it to create a distinct attribute.
auto distinctAttr = AccessGroupAttr::get(context, genAttr.getNextID());

It uses a generator attribute uniqued for the access group attribute with index zero and increments the counter of the generator to get the next unique identifier. ⚙ D148106 [mlir][llvm] Use attributes to store access groups. implements the access group refactoring. Follow up patches may update the remaining alias analysis metadata representations.

We will have further use cases once LLVM’s debug metadata changes land that will use an assignment metadata node similar to the access group nodes.

3 Likes

How do you ensure determinism? That is: even if you’re global counter is behind a mutex so there is no race condition (as far as C++ / undefined behavior is concerned), it seems to me that threads would race for the ids and potentially get different one on every runs?

Thanks your the comment! There is indeed a determinism problem with the current approach.

We though about two different solutions to alleviate / avoid the problem so far:

  1. We could introduce a “cleanup” pass that iterates all access group attributes in the order they appear in the IR and assign them a deterministic identifier, for example, according to their position in the IR. This solution would not completely eliminate non-determinism. For example, it does not solve the non-determinism problem when debugging a parallel pass that creates new access groups. Instead of a pass, we may also be able to solve the problem during the printing of the IR directly. However, the in-memory representation remains non-deterministic.

  2. Another solution may be to extend the access group attribute with a symbol that refers to the containing function. Assuming all attributes are only used inside the same parent function and the transformation of individual functions is always single-threaded, the symbol reference should result in deterministic access group attributes. In fact, we would not use the thread safety property of the mutate method in this case. It may still be helpful to have a generator attribute though. It would allow us to keep track of the access group identifiers that have already been assigned to access group attributes that existed before inlining the callable.

Let me give an example for the modified access group attribute:

#access_group = #llvm.access_group<@scope::@foo, id = 0>
#access_group1 = #llvm.access_group<@scope::@bar, id = 0>
#access_group2 = #llvm.access_group<@scope::@foo, id = 1>

module @scope {
  llvm.func @foo() {
    // ...
    llvm.store %0, %ptr { access_groups = [#access_group, #access_group2] }: i32, !llvm.ptr
  }
  llvm.func @bar() {
    // ...
    llvm.store %0, %ptr { access_groups = [#access_group1] }: i32, !llvm.ptr
  }
}

At the moment we have a preference for the second solution. The only disadvantage we are currently aware of is that the metadata now needs to have function scope in the sense that it should not be used across different functions. Are there any other obvious design flaws with this solution?

Also we are very open to other approaches to handle metadata during inlining (i.e. creating new access group attributes in parallel in a deterministic way)?

After lengthy internal discussions, we designed an improved solution that allows us to manipulate access groups in parallel in a deterministic way (alias scopes can use the same mechanism). The idea is closely related to the second solution discussed above in the sense that the identifiers of the access groups are generated with a function scope. As a result, different functions can be processed in parallel.

The example below illustrates the new proposal that introduces an explicit mutable distinct_sequence attribute – a revamped version of the GeneratorAttr – that is used to generate unique identifiers for the access groups:

#distinct_sequence = #llvm.distinct_sequence<scope = @foo, state = 2>
#distinct_sequence1 = #llvm.distinct_sequence<scope = @bar, state = 1>

#access_group = #llvm.access_group<id = 0, elem_of = #distinct_sequence>
#access_group1 = #llvm.access_group<id = 1, elem_of = #distinct_sequence>
#access_group2 = #llvm.access_group<id = 0, elem_of = #distinct_sequence1>

llvm.func @foo(%arg0: !llvm.ptr) {
  %0 = llvm.load %arg0 {access_groups = [#access_group, #access_group1]} : !llvm.ptr -> i32
}

llvm.func @bar(%arg0: !llvm.ptr) {
  %0 = llvm.load %arg0 {access_groups = [#access_group2]} : !llvm.ptr -> i32
}

Every access group attribute has an elem_of field that relates it to a distinct_sequence attribute that is unique to the function (the state field is mutable and not used for uniqueing). The state is incremented to get the next identifier if a pass needs to introduce a new access group attribute. This process is deterministic assuming a function is processed by a single thread. Additionally, the distinct_sequence attribute can be accessed given the function name only:

// Get the distinct sequence attribute for the function.
auto distinctSequence = DistinctSequenceAttr::get(
    SymbolRefAttr::get(StringAttr::get(context, funcName)));

// Create a new access group with a new unique identifier.
auto accessGroup = AccessGroupAttr::get(
    context, distinctSequence.getNextID(), distinctSequence);

Without this mechanism, the inliner, for example, would have to walk all access groups used at the callsite to find a new unique identifier.

@mehdi_amini does this solution address your concerns?

1 Like

I updated the revision (⚙ D148106 [mlir][llvm] Use attributes to store access groups.) to follow the new design.

Conceptual question: are we just trying to work around the impossibility to create distinct attributes in MLIR? The entire non-determinism discussion stems from the fact that we are storing more information than actually necessary: names, numbers, etc. whereas we only need the fact that two things are distinct. Maybe we can extend the attribute subsystem to bypass StorageUniquer and just create a new instance every time + give it a syntactic differentiator for round-tripping, e.g., #!llvm.access_group? FWIW, there is already a precedent in MLIRContext with IntegerSets not being uniqued after they exceed some size (these are not attributes, but I actually wonder what happens when we have two IntegerSetAttrs with identical but not uniqued IntegerSets).

Thanks for having a look at the RFC!

In principle the round-tripping for such not uniqued attributes already works if the attributes are printed with an alias since the alias could serve as identifier. If the attributes are printed inline we need to have some sort of identifier. I am not very familiar with that part of the codebase but I will try to come up with a solution.

One possible syntax for printing attributes inline may be:

llvm.func @foo(%arg0: !llvm.ptr) {
  %0 = llvm.load %arg0 {access_groups = [#llvm.access_group[id = 0]<>, #llvm.access_group[id = 1]<>]} : !llvm.ptr -> i32
}

llvm.func @bar(%arg0: !llvm.ptr) {
  %0 = llvm.load %arg0 {access_groups = [#llvm.access_group[id = 2]<>]} : !llvm.ptr -> i32
}

I am also fine with the hashtag exclamation mark to emphasize the attribute is “distinct”. Any syntax suggestions are very welcome.

This post proposes an extension to MLIR core that introduces distinct attributes. This design followed from diving deeper after @ftynse’s advice to consider MLIR core changes. Distinct attributes are attributes that are not uniqued during creation. Instead, every call to the attribute’s get method allocates a new attribute independent of its content. Otherwise, distinct attributes do not differ from normal attributes. This functionality will be helpful to model LLVM’s distinct metadata and may have other applications as well.

Syntax

The proposed syntax introduces a new distinct keyword but otherwise follows the attribute syntax quite closely:

distinct<0 = #llvm.my_distinct_attr<name = "test1">>
distinct<1 = #llvm.my_distinct_unit_attr>

The attribute itself is printed as before, assigned to a unique identifier, and wrapped into a distinct clause. Note that there is no backing storage holding the unique identifier. Instead the identifiers are generated during printing of the IR similar to aliases and SSA value names.

The syntax above shows the long form of the distinct attribute. It is only used if the attribute is printed for the first time. Any subsequent appearances of the same distinct attribute use the short form:

distinct<0>
distinct<1>

This short form simplifies the implementation since the parser does not need to regenerate the attribute to verify its content matches earlier instances of the distinct attribute. Instead it is sufficient to check the long form of the attribute has been printed before. Note that the short form is not used if the attribute is printed using aliases.

Distinct Attribute Definition

Distinct attributes are marked with an IsDistinct trait. Using a trait allows us to conveniently define distinct attributes using tablegen:

def LLVM_MyDistinctUnitAttr :
      LLVM_Attr<"MyDistinctUnitAttr", "my_distinct_unit_attr",
                         /*traits=*/[IsDistinct]> {}

Usage

The usage of distinct attributes does not differ from other attributes. The only relevant change is that every call of the get method returns a new attribute:

auto attr1 = MyDistinctUnitAttr::get(context);
auto attr2 = MyDistinctUnitAttr::get(context);
assert(attr1 != attr2);

Implementation

The prototype implementation marks the distinct attribute with a trait, but does not store a unique identifier as part of the attribute storage. Instead, the storage uniquer considers the distinct trait during attribute creation, and creates a new attribute every time. It therefore uses a distinct identifier as a hash value. The identifier is generated using an atomic counter stored in the context. Additionally, the isEqual method, which is only used internally by the storage uniquer, is adapted to always return false for distinct attributes (the attribute’s comparison operator compares the attribute pointers for equality). Using the existing storage uniquer allows us to benefit from its heavy performance optimizations and thread-safety.

Additionally, changes to the printing and parsing ensure the distinct attributes can be printed and parsed. At runtime, distinct attributes differentiate due to having different pointer values. When printing, an additional integer identifier is needed. These unique identifiers are assigned in the order the distinct attributes appear in the printed IR to guarantee deterministic numbering.

Non-determinism and Thread-safety

This proposed solution should finally guarantee both thread-safety (since it relies on the existing attribute infrastructure) and determinism (since the identifiers are generated according to the order in which the distinct attributes appear in the IR) in non-LLVM dialect specific way.

We plan to convert the existing prototype to a reviewable revision during the coming days. Early feedback regarding design problems is very welcome!

@River707 @ftynse @mehdi_amini

1 Like

I recall us going through a process where we did something similar path when looking at extern constants before introducing resources. And you have example in DenseResourceElementsAttr of having the non-uniqued value and Attribute.

So you have one atomic int to unique and one counter here - which sounds to me like a Resource would work for the first and not need to introduce anything inside the context vs something local to your get combined with the counter here. Is this handled by a variable somehow reset at start of print and incremented on each print? (Not sure how this work with printing an op vs printing the top level op). Is this requiring that the print order when serialized originally and the print order when consumed later be the same for roundtripping?

Have you looked into using properties for this? I would think that “distinct attribute” could be fully replaced by properties, without the issue of “leaking” memory in the context.
Resources is another direction as well, where the infra for these can be specific to the LLVM dialect and “out of band” from Attributes.

Hi Jacques,

Thanks for pointing me at DenseResourceElementsAttr.

So you have one atomic int to unique and one counter here

The atomic integer is mostly an implementation detail to ensure the distinct attributes do not lead to unnecessary collisions inside the hash table of the storage uniquer. The “counter” is used during printing and is indeed needed to ensure the distinct attributes are numbered in a deterministic way that does not depend on their creation order.

Is this handled by a variable somehow reset at start of print and incremented on each print?

Yes, I implemented such a mechanism in the prototype. The distinct attributes are then numbered according to their first “use” in the IR.

I wonder if resources provide a similar mechanism out of the box? The following code snippet makes me wonder if this is indeed the case:

  auto attr = AttrT::get(type, "resource",
                         UnmanagedAsmResourceBlob::allocateInferAlign(data));

Here it seems like the name of the resource is set manually? Is this necessary or is there a way to let the printer name the resources automatically?

Above I suggested an alternative approach that generates deterministic names manually on a per function basis. This solution would be viable for our use cases in the LLVM dialect. However, native support would be nicer!

Hi Mehdi,

Have you looked into using properties for this? I would think that “distinct attribute” could be fully replaced by properties, without the issue of “leaking” memory in the context.

How would such a replacement look like?

I assume printing and parsing a distinct identifier probably requires changes similar to the ones I have implemented for attributes. I.e. the printer needs to know which properties are distinct and then decorate them with a unique identifier during printing? This mechanism will need some sort of equality check which is easy for attributes. I assume for properties one could implement a custom comparison operator? Or is there a mechanism to produce unique identifiers automatically?

How could the distinct properties themselves look like? One idea I had was that they could use a shared pointer that points to some heap allocated memory location. If two properties point to the same memory location they are “unique” and if they point to different memory locations they are “distinct”?

How does thread-safety work for properties? Is it up to the user the ensure thread-safety?

We had a deeper look at a property-based solution and came to the following conclusions:

  • Properties also require a mechanism in the printer to produce unique identifiers in a deterministic manner (any hints on how to generate unique identifiers otherwise are very welcome). The disadvantage of properties is that they are less structured (e.g., there is no base class). Implementing infrastructure for them in a generic manner thus seems somewhat harder. For example, we may have to implement a method on the printer that produces a unique identifier given a void*, or we have to introduce a distinct property base class in MLIR core.
  • Attributes are strongly preferable for us since they compose with the existing metadata attributes. The distinct access group attributes for example would appear on memory operations, but they also appear nested inside of loop annotation attributes. Our understanding is that a property-based solution would break this composability. Similarly, distinct attributes may be added to debug metadata attributes at some point, for example, to model subprograms (currently everything is uniqued).
  • Attributes, thanks to aliases, would all print at the beginning of the module. With properties the distinct identifiers would print on every operation that uses a specific access group or alias scope. Having the distinct identifiers at the beginning of the module is thus more in line with LLVM IR’s metadata representation.

The following example shows how a distinct access group attribute would nested inside a loop annotation:

#loopMD = #llvm.loop_annotation<
        disableNonforced = false, mustProgress = true, …,
        isVectorized = false,
        parallelAccesses = #access_group, #access_group1>

Note that this is a very simple example. Loop annotations themselves can also be nested and in theory should be distinct themselves.

We thus suggest either going back to the distinct sequence proposal (it does not require any MLIR core changes and is local to LLVM dialect) or opting for distinct attribute support inside MLIR core (especially if we believe there are use cases outside of LLVM dialect).

I suspect an Open Meeting discussion would be likely useful at this point, can you prepare a short presentation with the problem to solve and the alternatives you looked at? It would likely help the discussion.

Also between the MLIR workshop and EuroLLVM preparation (it’s all next week), it’s been a bit difficult to invest time in thinking about this recently! (and River and others were busy with their own launch as well).

I suspect an Open Meeting discussion would be likely useful at this point, can you prepare a short presentation with the problem to solve and the alternatives you looked at?

Yes I can do that. I suppose three weeks from now (25th of May) would be the next opportunity due to Euro LLVM and public holidays on our end the week after.

1 Like

During the ODM on the topic (Open MLIR Meeting 5/24/2023: RFC Discussion about "Distinct Attribute" in MLIR) two paths forward have been discussed:

  • A resource based solution,
  • An attribute based solution that uses the alias printing facilities to implement distinct attributes.

I did some initial experimentation with resources. It seems the current implementation also suffers from non-determinism problem that my initial mutable attribute based solution had. Essentially, the resource is assigned a unique name during resource creation rather than during printing (llvm-project/DialectResourceBlobManager.cpp at 60b7dbb670afceab7d897699e00073b50e4c384a · llvm/llvm-project · GitHub). If the resources are created in parallel - for example in a function pass - the naming may be non-deterministic due to the threading.

The behavior can be tested using the following code snippet:

 OpBuilder builder(context);
  ArrayRef<int32_t> data(0);
  auto type = RankedTensorType::get(0, builder.getI32Type());

  SmallVector<Attribute> attributes(20);
  parallelFor(context, 0, 20, [&](size_t i) {
    attributes[i] = DenseI32ResourceElementsAttr::get(
        type, "distinct", UnmanagedAsmResourceBlob::allocateInferAlign(data));
  });

  for (const auto &it : llvm::enumerate(attributes)) {
    module.get()->setAttr("llvm.access_group_" + std::to_string(it.index()),
                          it.value());
  }

It creates resources in parallel and adds them to the module. On a multi-threaded system the resulting attribute assignment is non-deterministic / run dependent.

As the attribute alias printing already implements the logic to number attributes during printing,
the attribute based solution seems preferable over implementing a similar mechanism for resources? Or did I miss some mechanism that can produce unique resource names in a deterministic manner?

@River707 @mehdi_amini